AGI Countdown: OpenAI's Chief Research Officer Warns Humanity's Window Is "Very Small"

Mark Chen, Chief Research Officer of OpenAI, sent a strong signal: OpenAI does not believe that scaling laws have become invalid. On the contrary, pre-training, data engineering, inference training and longer task chains are still the main path to AGI.

Artificial General Intelligence (AGI) is on the horizon.

Just now, Mark Chen, the Chief Research Officer of OpenAI, boldly claimed:

In a sense, as if you can feel it, AGI (Artificial General Intelligence) is coming...

We are getting closer and closer to a world where models can independently come up with more innovations - they can conduct self-sustaining research.

This is not just an improvement in efficiency. "Evolution" itself has been outsourced to silicon-based life.

When Mark Chen was skillfully chopping mushrooms and onions in front of the camera, he was talking not just about a bowl of soup, but the last stronghold of human civilization.

If AI can research itself, on the eve of the arrival of AGI, what role should humans play?

Every field is experiencing its own "divine move"

To understand the weight of this statement, we need to go back to the moment when Mark entered the industry.

In 2016, AlphaGo faced off against Lee Sedol.

In the second game, there was a move, "Move 37". At the moment the piece was placed, all human Go players were collectively confused.

Later, we realized that it was a move made by the machine that humans could never have thought of. That moment inspired countless people and also pulled Mark Chen into this field.

And now?

"The craziest thing is," Mark said, "you can now see the 'divine move' in almost every field."

It exists in mathematics, in computer science, and in programming.

He described a very subtle point in time: Many people "woke up" at the beginning of this year and suddenly realized that AI agents can really do work in their own fields.

It's not a toy. It's not a demo. It can complete meaningful, long-horizon real work for you.

This means that the idea of "models doing research on their own" is no longer a plot in a science fiction movie.

It is a natural extrapolation from a series of already-occurred "divine moves".

If you look ahead along this line, at the end stands the model that can conduct research on its own.

Scaling continues, pre-training is not dead

But what exactly supports this kind of optimism?

It's based on a belief: The Scaling curve hasn't reached its end.

In the past two years, the arguments that "pre-training is dead" and "language models can't reach AGI" have popped up every now and then.

Mark Chen "strongly opposes" these doomsday prophecies.

He pointed out the pattern.

The claim that "pre-training is dead" sounds new, but in fact, it's an old script that has been replayed over and over again in the past few years.

Every time, someone points to a certain bottleneck and says, "It's reached the limit, we can't get through." Every time, OpenAI can always come up with a new engineering technique or a new research insight to break through that wall.

Mark Chen firmly believes that "we are on an exponential curve. It has withstood nearly 10 orders of magnitude, and there is no reason why it won't continue to hold."

The most convincing evidence is that OpenAI has won a bet on its own.

The bet was on reasoning.

When o1 was first launched, even some people within OpenAI didn't believe in it.

At that time, the paradigm of "pre-training + post-training" was so powerful that some people would naturally ask: The machine is running well, why bother with something else?

It was Jakub Pachocki, Ilya Sutskever and a few other people with beliefs and judgment who pushed it hard, and gradually turned it into a fundamental bet for the whole company.

One year later, o1 was born, and the reasoning paradigm detonated the entire industry.

The curve hasn't reached its end, and the biggest breakthroughs often come from bets that no one believes in at the beginning. These two factors together give Mark Chen the confidence to say that "self-sustaining research by models is not far away."

When the model starts to think about tasks that last for weeks or even months, the innovations it generates may already exceed the cognitive blind spots of human experts.

This is the cornerstone of "self-sustaining scientific research": If it can derive mathematical formulas that humans have never seen, it can certainly write better algorithm architectures than humans.

Vibe Researcher: When execution becomes cheap

We already have vibe coders - just talk and let AI write code.

Research is also moving in this direction.

In the interview, a highly controversial concept was repeatedly mentioned: Vibe Researcher.

This is a slightly self-deprecating but well-thought-out career prediction.

Mark believes that in the future, top researchers will no longer be those who write every line of PyTorch code, but those who "grasp the feeling".

Whether it's OpenAI or other laboratories, you can start to see that a large amount of work is becoming mainly about "orchestration".

Translated into plain language: Humans are responsible for coming up with ideas, and the model is responsible for doing all the work.

Researchers use their brains to come up with ideas, and the model takes care of the rest, including implementation, execution, and scheduling.

OpenAI's three-year roadmap clearly states the end goal: Let the model conduct end-to-end research, from coming up with ideas to producing results, all on its own.

But there are still many unfilled potholes on this road

As AI can autonomously execute and orchestrate tasks, human work will be extremely compressed to two ends:

1. Ask real questions.

2. Judge whether the answers given by AI have a "soul".

This is what is called "Taste".

Since machines don't have "life", they don't have "common sense", and thus can't have "taste".

But if we calm down and think about it, Mark Chen himself knows better than anyone that this road is far from being smooth.

The first pothole: Evaluation has collapsed.

He used an internal term called "Benchmaxxing" - finding a bunch of questions that look almost exactly the same as the test set and training the model to death. The scores look great, but the generalization ability doesn't increase at all.

Even worse, the number of recognized gold-standard benchmarks is too small.

"We are really in an evaluation crisis," he said. Classic tests like the SAT are saturated for today's models.

Even once an evaluation is made public, it's no longer a good evaluation, just like a test paper that becomes invalid as soon as it's printed.

Two strategies to deal with this problem:

1. Separate the evaluation creation team from the model optimization team to form an adversarial incentive.

2. Deploy the model on a large scale and observe the failure modes in real applications.

He also pointed out that the emergence of each new ability is accompanied by corresponding evaluation requirements, and guiding the evaluation direction is a very important part of his work.

The second pothole: The jagged frontier.

The model can solve difficult problems at the level of the Olympiad in Mathematics and the Olympiad in Informatics, but may not be able to handle trivial things that humans can do easily, just like a genius who can do mental calculus of calculus but can't tie his own shoelaces.

Where is the difference? It lies in "context" and in continual learning - applying the lessons learned from one task to the next.

This is very natural for humans, but for models, it's a hard nut that the entire industry is struggling with.

When asked if two or three fundamental breakthroughs are still needed to reach AGI, Mark didn't answer directly.

He said that continual learning is a "basic ability that must be unlocked". As for whether it counts as a "breakthrough", he's not sure, but "many shots are already aimed at the goal, and I'm quite sure they will score."

This is his attitude: The potholes are real, and there are already people trying to fill each one, and he bets that they can be filled.

The metaphor of the soup: Open a noodle shop after AGI

The warmest moment in the interview was the story about "soup".

It is said that Mark Zuckerberg once tried to poach OpenAI researchers with homemade soup, and Mark Chen's response was to bring the soup directly to the office and share it with everyone.

When asked about his ultimate wish after the realization of AGI, this person who is in charge of the world's most powerful AI brain replied:

"I want to open a noodle shop. This might be my post-AGI hobby."

There is a deep meaning hidden in this answer.

When AI can complete all "self-sustaining scientific research" and all knowledge and innovation can be generated at the speed of light, the most scarce resource for humans will no longer be intelligence, but "experience".

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The AGI countdown, OpenAI's chief research officer made a significant statement: the window left for humanity is "very small"

Every field is experiencing its own "divine move"

Scaling continues, pre-training is not dead

Vibe Researcher: When execution becomes cheap

But there are still many unfilled potholes on this road

The metaphor of the soup: Open a noodle shop after AGI