After leaving OpenAI, Ilya Sutskever had a 1.5-hour conversation: AGI can be achieved in as little as 5 years.
- Sutskever predicts that human-level AGI will be achieved within 5 to 20 years.
- The "scaling" approach of simply piling up data and computing power has reached its peak.
- There are "jagged" gaps in model capabilities: models perform excellently in evaluations but often make elementary mistakes in applications.
- Value functions, similar to human emotions, can guide AI to learn more efficiently and robustly.
- Not rushing into commercialization: Focus on research aiming for "direct access to superintelligence" and also consider gradual deployment.
- Letting AIs compete with each other and think differently is one way to break the "model homogenization" problem.
- Good research should be concise, elegant, and correctly draw inspiration from brain mechanisms.
Early on November 26th, Ilya Sutskever, co-founder of OpenAI and CEO of the superintelligence company Safe Superintelligence (SSI for short), was interviewed by well - known podcast host Dwarkesh Patel. They discussed SSI's strategy, problems in pre - training, how to improve the generalization ability of AI models, and how to ensure the smooth development of AGI.
The following is the transcript of the interview with Ilya Sutskever:
01 Model capabilities are "jagged": Perfect scores in exams, but failures in real - world applications?
Patel: Where should we start our discussion?
Sutskever: Do you know what's truly incredible? All these AI technologies have actually become a reality. Doesn't it seem like something straight out of a science - fiction novel?
Patel: Indeed. Another surprising phenomenon is that the incremental development of AI feels so ordinary. Imagine if we invest 1% of GDP in the AI field. This should be a huge event, but now it seems taken for granted.
Sutskever: Humans do adapt to things very quickly, and the current development of AI is still relatively abstract. You only read in the news that a company has announced a huge investment, but you don't feel the real impact in your daily life.
Patel: Do you think this situation will continue?
Sutskever: I don't think so. Although many investments in the AI field are currently hard to understand, AI will eventually penetrate all sectors of the economy, generating a powerful economic driving force, and its impact will become more and more obvious.
Patel: When do you expect the actual economic impact of AI to appear? Although current AI technologies seem very powerful, the economic value they create in practical applications is not that significant.
Sutskever: That's true. This is one of the most confusing phenomena in the current AI field. How can we explain that models perform excellently in evaluation tests, but their economic contributions lag far behind? The evaluation questions are quite complex, and models complete them well, but the actual application effects are greatly reduced. For example, models may repeatedly make the same mistakes in some cases, which is really puzzling.
Let me give you a specific example: Suppose you encounter an error when using vibe coding to complete a task, and then you ask the model to fix it. It will comply, but it will introduce new errors during the fixing process. When you point out the new errors, it admits the mistakes again, but brings back the original errors. This kind of cycle often occurs. Although the specific reasons are not clear, it indicates that there are indeed some abnormalities in the system.
I have two possible explanations. One is that perhaps reinforcement learning training makes the model too focused and narrow, lacking a certain "awareness". As a result, they can't do some basic things well.
Another explanation is that pre - training uses all the data, while reinforcement learning training requires the selection of specific training environments. When designing these environments, there are too many variables, which may lead to the accidental optimization of some evaluation goals while ignoring the needs in practical applications.
This can also explain the disconnection between evaluation performance and actual effects, especially the poor generalization ability of models. Simply put, the excellent performance of models in evaluations does not always translate into success in practical applications, mainly due to the deviation between the training environment and the goals.
Patel: I like your view: The real "reward hackers" are actually human researchers who focus too much on evaluations.
The problem you mentioned can be viewed from two perspectives. Firstly, if a model only performs well in programming competitions, it doesn't mean it can make better judgments or more "tasteful" improvements in other tasks. Therefore, the training environment needs to be expanded. In addition to programming competitions, the performance of models in other tasks, such as tasks X, Y, and Z, should also be evaluated.
Secondly, why doesn't excellent performance in programming competitions necessarily make a model a more tasteful programmer? Maybe the problem is not about increasing the number of training environments, but about how to enable the model to learn in one environment and apply these experiences to other tasks.
Sutskever: I can use an analogy about humans to illustrate. Take programming competitions as an example again: Suppose there are two students. One is determined to become the best competitive programmer, so he spends ten thousand hours practicing, solving all the problems, memorizing all the skills, and implementing all the algorithms quickly and proficiently, and finally becomes a top - level competitor. The other student thinks competitive programming is cool, but only practices for a hundred hours, far less than the former, yet still performs quite well. Who do you think will perform better in their future careers?
Patel: The second student.
Sutskever: Right. I think the situation of models is more similar to that of the first student, or even more extreme. Current models are like "exam experts". In order to make them proficient in programming competitions, we cram them with a large number of questions. As a result, although they become good at answering questions, they still have difficulty applying the learned knowledge flexibly to other tasks.
Patel: But what is the analogy for the second student before the one - hundred - hour fine - tuning?
Sutskever: I think they have "a certain trait". I met such students when I was an undergraduate. I know such people exist.
Patel: I find it interesting to distinguish "a certain trait" from what pre - training does. One way to understand your statement that pre - training doesn't need to select data is that it's not entirely different from ten thousand hours of practice, except that these ten thousand hours of practice are "free" because they already exist somewhere in the pre - training data distribution. But maybe you're implying that the generalization brought by pre - training isn't that much. The amount of data in pre - training is indeed huge, but it may not generalize better than reinforcement learning.
Sutskever: The main advantages of pre - training are twofold. Firstly, the data volume is huge. Secondly, you don't need to worry about what data to use for pre - training. These are very natural data, containing various human behaviors, ideas, and characteristics. It's like the whole world is projected onto text through humans, and pre - training tries to capture all this with a large amount of data.
It's difficult to reason about pre - training because it's hard for us to understand the specific way the model depends on pre - training data. When the model makes mistakes, is it because something just doesn't get enough support from the pre - training data? "Getting pre - training support" may be a rather loose statement, and I'm not sure if I can add more useful content. I don't think there is a perfect human analogy for pre - training.
02 Value functions: Is the "emotional system" of AI here?
Patel: People have proposed several analogies between humans and pre - training. I'm curious why you think they might be inaccurate. One analogy is to regard the first 15 or 18 years of a person's life as the pre - training stage. At that time, they are not economically productive but are learning to better understand the world. Another analogy is to regard evolution as a kind of search that has been going on for 3 billion years and finally produced humans. I'm curious if you think either of these two situations is similar to pre - training. If not pre - training, how do you view the process of human lifelong learning?
Sutskever: I think both have some similarities with pre - training, and pre - training tries to play the roles of both, but there are also significant differences, after all, the amount of pre - training data is very, very large.
But strangely, even after living for 15 years, a human has only been exposed to a small part of the pre - training data. They know much less, but whatever they know, they seem to understand it much more deeply. At that age, you won't make the mistakes our AIs make.
There is another thing. You may ask if it's something like evolution. Maybe the answer is yes. But in this case, I think evolution may have more advantages. I remember reading some cases where neuroscientists studied people with damage to different parts of the brain to understand brain functions. Some people have the strangest symptoms you can imagine, which is actually very interesting.
I thought of a relevant case. I read about a person who suffered brain damage due to a stroke or an accident, which damaged his emotional processing ability, so he no longer felt any emotions. He was still articulate, could solve some small puzzles, and seemed completely normal in tests. But he couldn't feel emotions: no sadness, no anger, no vitality. As a result, he became extremely bad at making any decisions. It took him several hours to decide which pair of socks to wear, and he made very bad financial decisions.
This shows that our internal emotions play an important role in making us viable intelligent agents.
Patel: What is "a certain trait"? Obviously, it's not directly emotions. It seems to be something close to a value function, telling you what the ultimate reward for any decision should be. Do you think this will implicitly emerge from pre - training?
Sutskever: I think it's possible. I'm just saying it's not 100% obvious.
Patel: But what exactly is it? How do you view emotions? What is the machine - learning analogy for emotions?
Sutskever: It should be something like a value function. But I don't think there is a good machine - learning analogy at present because value functions don't play a very prominent role in people's current work.
Patel: Maybe you can define what a value function is for us.
Sutskever: Sure! In reinforcement learning, the current typical approach is as follows: You have a neural network, give it a problem, and then tell the model: "Go and solve it." The model will perform thousands or tens of thousands of actions or thinking steps and finally get a solution. This solution will be evaluated and scored.
This score is then used to provide a training signal for each step of the model's action. That is, if the model takes a long time to get the final solution, it won't learn any useful information during the process until the final answer comes out. This way is very common in reinforcement learning and is roughly the strategy adopted by models like OpenAI O1 and DeepSeek R1.
The concept of a value function is similar to: "I may not always be able to tell you immediately whether you're doing well or not, but sometimes I can give you an early warning." This concept is especially useful in some fields. For example, when playing chess, if you lose a piece, you immediately know you've made a mistake. You don't need to wait until the end of the whole game to know which step was wrong, which means you can realize that some previous decisions were also bad. This feedback can help you adjust your strategy faster and improve efficiency.
A value function can help you speed up the process of finding the final result. Suppose you're solving a math problem or a programming problem and are trying a certain solution or direction. For example, after thinking for a thousand steps, you realize this direction is hopeless. At this time, you can get a feedback signal in advance, telling you this way won't work, instead of waiting until a thousand steps later to find out. You can conclude: "Next time I encounter a similar situation, I shouldn't take this path." In this way, you actually adjust your strategy before getting the final answer.
Patel: In the paper of DeepSeek R1, it's mentioned that the trajectory space is so vast that it's difficult to directly infer the final result from the intermediate steps. Moreover, in programming, you may take the wrong path first and then go back to correct it.
Sutskever: This sounds like a lack of confidence in deep learning. Indeed, it's difficult to do, but I believe deep learning can solve this problem. I expect value functions to be very useful. Although we haven't fully achieved it yet, it will definitely be used in the future. I mentioned the person with damaged emotional centers mainly to illustrate that maybe it shows that the human value function is regulated by emotions, and this regulation is hard - coded in the evolutionary process. Maybe this mechanism is crucial for our effective actions.
Patel: This is exactly what I wanted to ask you. There is an interesting aspect about the relationship between value functions and emotions: Although emotions are very useful in many cases, they are also relatively easy to understand, which is really impressive.
Sutskever: I have two responses. Firstly, I agree that compared with the AI systems we're researching, emotions are relatively simple, so that we can explain them in a way that humans can understand. I think it would be interesting to be able to do this.
Regarding utility, I think there is a trade - off between complexity and robustness. Complex things may be useful, but simple things can also work in many situations. The emotions we see now mainly evolved from our mammalian ancestors and were slightly adjusted when we became early humans. We do have some social emotions, which may not exist in mammals. But they are not complex. Because of this, they still serve us well in today's world, which is completely different from the environment our ancestors lived in.
Of course, emotions can also go wrong. For example, I'm not sure if hunger counts as an emotion. This is a bit controversial. But I think, for example, our intuitive feeling of hunger may not accurately guide our actions in today's world with abundant food.
03 Scaling is dead, the research era is reborn!
Patel: People always talk about scaling data, scaling parameters, and scaling computing. So, is there a more general way to think about the concept of "scaling"? Besides these, in what other aspects can we scale?
Sutskever: I have a possibly correct perspective. In the past, the way machine learning worked was that people just made random attempts and tried their best to find interesting results. That's how simple it was at that time.
Then, the concept of "scaling" emerged. With breakthroughs like GPT - 3, suddenly everyone realized that they should start "scaling". The word "scaling" itself is very powerful, telling people what to do. So, people began to say: "We're going to try scaling." So, you ask what we're scaling? The answer is - pre - training. At that stage, pre - training became the target to be scaled, and it was a specific "scaling formula".
The breakthrough of pre - training is that we realized this formula works. You can mix some computing, some data, and a neural network of an appropriate size, and finally get a result. Even better, if you scale up this formula proportionally, you'll get better results. This discovery is very valuable, especially for companies, because it provides a low - risk way to allocate resources.
In contrast, it's much more difficult to allocate resources to research. If you do research, you have to tell researchers: "Go and do some research and come up with some results." On the other hand, by obtaining more data and more computing resources, you know you'll definitely get something from pre - training.
Of course, some people are discussing technologies like Gemini, seemingly having found a way to extract more value from pre - training. But the problem is that data is