HomeArticle

The New Year speeches of two scientists on AI for science

Muqiu2025-01-29 11:55
Exploring the Essence of AI Intelligence and Educational Transformation.

On January 12, 2025, in the afternoon, the Zhishi Qianyan Technology Promotion Center in Haidian District, Beijing, a scientific public welfare institution, held an annual scientific event with the theme of "AI for Science, AI for Good".

The dean of the School of Computing and Data Science at the University of Hong Kong and the founder of Yisheng Technology, Yi Ma, and the dean of the Amazon Web Services Shanghai Artificial Intelligence Research Institute, Zhang Zheng, respectively gave New Year scientific speeches with the themes of Exploring the Road to the Essence of Intelligence and In the Era of Large Models, New Challenges in Education - From Assembly Line to Renaissance. The following is a summary of the viewpoints:

Yi Ma: Exploring the Road to the Essence of Intelligence

1. Previously, I mentioned a quote by Einstein about science: "Everything should be made as simple as possible, but not any simpler." Everything should be explained as simply as possible until it cannot be simpler. It needs to be simplified, finding the laws of the world in the simplest way, but not too simply, otherwise, it cannot explain the phenomena. In my opinion, these two sentences are the essence of intelligence.

2. DNA is the first large model in nature. Life initially relied on DNA, with natural variations from generation to generation, survival of the fittest. Continuously modifying, trial and error, and passing it on. Individuals do not have much intelligence, but the group has intelligence through natural selection. This process now has a very trendy name, reinforcement learning. It is not that it cannot progress, but the cost is huge, with many sacrifices for one success. The current large models are like this. We do not understand their mechanism. Various teams are constantly trying and making errors. It is a battle of hundreds of models, a chaos of demons. Only the fittest survives. The mechanism is the same, and the phenomenon is the same. It is not that it cannot progress, but the cost is huge. You cannot do this without several hundred million dollars.

3. Five hundred million years ago, individual nervous systems in the brain appeared, and eyes began to emerge. Individuals obtained information from the external world, causing the Cambrian explosion of life. To a certain extent, the brain replaced the role of DNA, and individuals gained intelligence. Therefore, in biological species, intelligence is called genetic inheritance and natural selection evolution. Individuals have the intelligence of acquired learning and adaptation. This is a very significant leap, a leap in the intelligent mechanism.

4. Later, when it came to humans, animals began to live in groups, and information exchange emerged, along with the appearance of language and writing. The intelligent mechanism is improving. It is no longer individual learning, and what I learn can also be passed on through language and writing exchanges. Language civilization replaced another part of the role of DNA. This is group intelligence.

5. Several thousand years ago, another event occurred: mathematics and science. Humans learned the ability of abstraction, and a lot of knowledge exceeded the ability to extract from empirical data. This is human intelligence. In the 1940s, to predict the future, one must understand history. Those who do academic research must clarify history. Where is the origin of this truly intelligent thing? Now, when it is mentioned, it is AI from five or six years ago. This is completely wrong. Those who are truly interested in intelligence are from the 1940s. Because a large part of scientists hoped that machines could simulate the abilities of animals or humans, including how useful information is stored. His student invented cybernetics, how to improve one's own decision-making. Von Neumann's "Game Theory", how to learn through the human brain, the first mathematical model of the artificial neural network, wanting to know the perceptual ability to simulate the external world, what is this system, and what is the mechanism? At that time, there was a book by Wiener, "Cybernetics". For these students, they believed that the mathematical mechanism behind intelligence is unified. As long as you find these mechanisms, animals and machines are inseparable.

6. In the past ten years, the neural network in 2012 has indeed been remarkable with the support of computing power and data, making the realization of deep networks possible. The development of text, images, and even science has advanced by leaps and bounds. Mainly, it is the realization of this mechanism that was previously recognized and has become technically possible. It is even believed that our technology has made progress. Including what I said to my former colleagues, perhaps a white box is enough, as long as it works, right? To a certain extent, from an engineering perspective, it is acceptable, but from a scientific perspective, it is not. Anyone who understands history knows that as long as something is very influential and is a black box, it will be exploited. This has been the case since ancient times. From this perspective alone, we must figure out what intelligence is and what neural networks are doing.

7. How to define intelligence as a scientific problem, what exactly is its scientific problem, what is its mathematical problem, and how to prove its correct scientific method should now be on the agenda. Otherwise, many people will hype and fear. Atomic bombs, viruses, if not understood, will become big problems. This is the responsibility of the scientists present and must be clarified. We must truly turn it into a scientific problem and clearly explain what intelligence is supposed to learn and do, why life can exist, and what its basic mechanism is? Then, it is how to learn, why there are neural networks, and how to do this thing correctly, well, and efficiently? This is a question that we must answer.

8. Everyone, even every cat and dog, is Newton, but they just don't know it. They have established a very accurate physical model of the external world. When an object falls, birds and cats can catch it quickly, even faster than humans. They can use the laws they have learned before to make accurate predictions of the external physical world. Newton's theorem describes what cats and dogs have learned, only the language and form are different.

9. If mathematics is on a line, but something is missing, and you know how to fill in the blanks, this is what AI does. GPT is doing cloze tests, and Transformer is doing this. What else can be done? Denoising. We observe that there is noise. After finding the rules, we can denoise. Images that are not clear can be denoised. Now, the sounds and images that you see and hear generated by AI are doing this, doing this thing right. What else can be done? Error correction. I observe that something is wrong, but it does not conform to my rules. Something is blocked. Our brain has always been doing this. I don't need to see everything. I can fill in the blanks and be in charge. It can be restored even if it is damaged, far beyond human imagination, and this is what is being done.

10. Since this is the case, our entire unified mathematical problem is to learn the positional distribution of these data from high-dimensional data and then organize and structure it. The brain is doing this. Find the correlation between the data and find the rules. Now in a high-dimensional space, a picture in a space of one million pixels or ten million pixels, but the structure is only a few dimensions. The universe is ever-changing, but how many-dimensional models are there? Now the highest-dimensional ones, some mathematicians say that 9 dimensions are enough, or 11 dimensions are enough. From the Big Bang to now, all observed physical phenomena can be completely described using a 9-dimensional or 11-dimensional space. It is very simple. The rules are very simple, but the phenomena are ever-changing.

11. How to learn, from experience to principle, and what is the neural network doing? For example, when we know that during learning, we need to find the distribution of data and reduce this entropy to find its rules. How to do it? This is a very complex function, and the goal is very complex. Everyone can climb a mountain, right? Local optimization, right? Nature is not that smart. I don't know how to do it either, but I know how to make the current situation a little better, gradually optimizing step by step, organizing the incoming data a little bit to reduce the entropy a little bit. The neural network is organizing the data at each layer to make the output a little better than the input. Therefore, the entire role of the neural network and its function become obvious. It is doing compression and implementing these mathematical operators to achieve this function. You can immediately derive these operators using mathematical methods. You know that to optimize this objective function, you know how to take the derivative, right? After taking the derivative, you do gradient descent. After gradient descent, you can find that this operator has the structure of Transformer, and the derived operator and structure are more concise. Finally, the mathematics and structure learned are more statistically and geometrically meaningful, which is clustering and classification. Completely understanding the goal of the neural network, you can design it. It is clear at a glance what each layer is supposed to achieve, and it is completely interpretable and controllable. Everything about what each operator and each parameter is doing can be made very clear.

12. From the initial white-box computing to now, billions of dollars, many redundancies and unclear parts that were originally designed through experience can be achieved. The current Transformer has a quadratic complexity, and now the optimization can be turned into a linear complexity operator, and it is not guessed, but calculated, which is more efficient. Everything that is not necessary can be eliminated.

13. This is only learning, learning the distribution from external data and organizing it well, but you don't know if you are doing it right, if anything is missing, or if the data is sufficient. You don't know if your memory is complete. How to verify that the model you obtained after compression and denoising is sufficient? How to do it? There is only one way, to go back and use it, to predict. Therefore, to verify whether our books and memories are complete, we must go back and verify. This is what this year's Nobel Prize winners are doing, just trying to do autoencoding well. It's just that the method at that time was inspired by physics, and now it seems not very correct, but the problem is correct. How to do this? I know that I am doing compression. All the designs are white-box, without any guesses. These operators are all answered mathematically, very clearly. It is the same effect as the MIE designed through experience, or even better.

14. There is another thing. Is encoding alone enough? There is no such saying in nature. Does every cat and dog have this memory? No. All our learning is in the brain, and we cannot control the external world. But nature has no chance. When a goat sees a tiger rushing towards it, wait a moment, I will measure your distance and speed. I'm not very good at it yet. This kind of thing has long been eliminated. All your learning is autonomous learning. Why do some people now say that they want to train models? It's very simple. These people want to sell data to you and sell chips to you, right? Because this kind of training is very costly, while our little ants and small animals can learn autonomously efficiently without too much data because the mechanism is different.

15. Your brain is learning every day since you were a child, but the things you learned before will not be forgotten. A closed-loop system will not be forgotten. And such a system has such characteristics in living organisms, that is, it organizes its memory in this way. It is very well organized in the monkey brain. This is an orthogonal space and is a sparse expression. It is learning through closed-loop, feedback, and self-control. These mechanisms can be seen in nature.

16. I suggest that young people nowadays read history carefully and take a serious look. Don't just come up and think about what artificial intelligence is doing. They were talking about it at that time. These young people at Dartmouth avoided Wiener and von Neumann. These people wanted to stand out and wanted to do something different from animal perception and prediction of intelligence. What are humans doing? In the 1950s, Turing proposed the Turing Test. They wanted to know how humans solve abstract problems and be able to prove it. This is human intelligence. When we compare what we have done in the intelligent development of the past ten years with the machine intelligence in the 1940s, the animal intelligence, and the human intelligence in the 1950s, you will find which is closer to which. The artificial intelligence of the past ten years is still far behind.

17. In the past ten years, science often uses two methods, one is the inductive method, and the other is the deductive method. Both have their reasons and complement each other. In the past ten years, we have made rapid progress in technology, mainly relying on the inductive method. But I hope that in the next ten years, if intelligence becomes a scientific problem, a problem of science and mathematics, there should be a good mathematical theoretical framework. This is also what our computer pioneers said, returning to the theoretical cornerstone and exploring the essence of intelligence. After so much training in the past, now is the era of calling for heroes. The greatest truths are the simplest. Find the mechanism, principle, and idea behind intelligence. More ideas, less technology.

Zhang Zheng: In the Era of Large Models, New Challenges in Education - From Assembly Line to Renaissance

1. The development of technology should be viewed in the context of the long history of humanity. Someone on the Internet summarized that if we regard the past 250,000 years as a book, with each page representing 250 years, you will find that most of the places in this book are blank. Agricultural society only occurred later. This is very natural. But such a book gives you the illusion that it seems that humans were just lying down or in a daze in the earlier times, doing nothing. I think one example that can be mentioned is "A Brief History of Humankind", which presents a very important viewpoint that the progress or regression of humanity is due to being domesticated by wheat. Because it is a brief history, it gives you the impression that this happened very suddenly. In fact, in the agricultural society, it took about a thousand years for agriculture to become a way of life. Humans spent a long time experimenting with agriculture and did not immediately give up hunting and gathering activities. Instead, they tried many different lifestyles before finally turning to an agricultural life with wheat as the main source of energy. In other words, we cannot say that the viewpoint that wheat domesticated humans is wrong. But if we look back at the history of that time, our ancestors made their own choices and optimizations at that time.

2. We regard ourselves as an intelligent agent and also regard the large model as an intelligent agent, and we make a comparison. This is the education system that everyone is familiar with. It is an assembly line, from primary school to middle school, and then starting the university life, and later some higher education. Crossing the single-plank bridge and then the steel wire to become various specialized talents, scientists, engineers, doctors, lawyers, managers, writers, etc. This is the assembly line of the current education. The characteristic of the education assembly line is that it is highly modular and highly standardized. Why? Because we want to make it an efficient assembly line. In the AI era, some adjustments can be made to certain parts of it. Some people can learn faster, and some people learn a bit slower. But humans are like this. Some studies say that the IQ of each generation is slightly better than the previous generation. Abstract thinking. The result of urban life is that the abstract thinking ability of each generation is higher, not that we are smarter. Every individual tries to avoid some of this. At the beginning, it is still chaotic and still needs to learn. What is the product produced by this assembly line? We believe that a single specialized expert in a certain field is a sign of success. They can publish some papers, which is very powerful. They may also have an understanding of adjacent fields. This is a relatively successful product produced by our current talent assembly line.

3. There is another kind of assembly line that sounds very unreasonable, which is rote memorization. First, memorize, and then do as I say, and then shape you into a good certain kind of intelligent agent. Does it sound reasonable? But precisely this is the path that large language models have taken. Its first task, pre-training, is to continuously memorize the next word. The problem is that the amount is very large. The training sample of GPT3 was about 1.5 million books initially. Taking myself as a benchmark, in a good year, I can read at most 20 books, but now I estimate that it is remarkable if I can finish reading 5 books in a year. Estimating, one can read 1000 books in a lifetime, while GPT3 reads 1.5 million books in 3 months.

4. This is essentially a training program. What this training does is to print the next character, not a random character, but one that conforms to the statistical rules in this text. Given the previous X characters, I know what the X + 1 character is most likely to be. This is the first step. The second step is that it does as I say. This step is very subtle. What it wants to do is that I have some examples. For example, I have an article and ask you to summarize it. This is one of the tasks. There are about a dozen such tasks, such as summarizing, answering questions, brainstorming, and extracting information. Why do this? Because in our human work, the remarkable things in the work that each of us does every day are those types. But one unexpected aspect of the large language model is that once it learns the ability of N types, it can combine them. For example, someone sends me an email and asks me to give a speech at a meeting. I will first summarize the matter and then refuse or agree in a clever way. You will combine several abilities here. This is the second step of the large language model. The third step is relatively simple, which is the carrot and the stick. Beat this large model into a more obedient human, which is to do some value alignment using the method of reinforcement learning. This is very interesting. It needs to be helpful, real, and harmless. This is its learning method.

5. Let's first discuss the nature of the data itself. The one on the left is the normal distribution. As long as it is the result of the iteration of many elements, it will eventually be a normal distribution. I am