Ultraman: ChatGPT war nur ein Zufall, ein allumfassender künstlicher Intelligenz-Agent ist das wahre Herzenschenken. Karpathy: Ich habe das schon vor 7 Jahren gedacht.
History is often shaped by countless "accidents and coincidences."
In 2022, when researcher Hunter Lightman joined OpenAI, his colleagues were busy with the globally explosive ChatGPT - undoubtedly the fastest-growing phenomenon-level product in history.
However, Lightman quietly joined a seemingly insignificant team: MathGen.
Their only task was to train an AI model to tackle high-school-level math competition problems.
Now, this once unknown MathGen team is actually regarded as the real reason why OpenAI can dominate the industry today!
On May 31, 2023, OpenAI published a research blog titled "Improving Mathematical Reasoning with Process Supervision," officially presenting the effectiveness of process supervision training.
And in the author list, researchers related to the MathGen team, such as Hunter Lightman, appeared. This blog is one of the first official releases related to the MathGen team.
On the same day, Altman posted a congratulatory message on X - this was the first time OpenAI officially confirmed the existence of the MathGen Team.
The "AI reasoning ability" they forged is precisely the heart of the ultimate technology - the AI agent!
This kind of agent will, like humans, independently complete all tasks you assign on the computer!
"At that time, the AI's mathematical reasoning ability was simply terrible!" Lightman recalled. "Our mission was to make it learn to truly think."
Evolution from a "dumb student" to an "Olympiad gold medalist"!
To be fair, today's OpenAI models are far from perfect - they still "talk nonsense seriously," and those so - called AI agents are often helpless in the face of complex tasks.
However, a huge change is taking place!
The mathematical reasoning ability of OpenAI's top - tier models has achieved a stunning comeback!
Recently, one of OpenAI's models won a gold medal in the world's most prestigious International Mathematical Olympiad (IMO) competition.
OpenAI firmly believes that this powerful reasoning ability can be replicated in any field!
This is precisely the cornerstone for building a general AI agent, and it is the ultimate dream they have been yearning for since the very beginning of their establishment!
If the success of ChatGPT was an "unintended masterpiece," a miracle that was supposed to be a low - key test but unexpectedly exploded globally.
Then, the AI agent is the strategic crystallization of OpenAI's careful planning and thoughtful consideration over several years!
"In the future, you just need to give instructions to the computer, and it will handle everything for you!"
Altman, the CEO of OpenAI, announced at the 2023 Developer Conference. "This ability is the AI agent. The disruption it brings will be unprecedented!"
Will Altman's prediction come true? The world is watching. But OpenAI has already taken action!
In the autumn of 2024, its first AI reasoning model o1 emerged out of nowhere and made a big splash!
In less than a year, the 21 core researchers who created this myth instantly became the top talents that Silicon Valley was vying for!
Zuckerberg spared no expense and offered a sky - high salary of over a hundred million dollars to poach 5 core members of the o1 team from OpenAI to form Meta's "super - intelligent" legion.
One of them, Zhao Shengjia, an alumnus of Tsinghua University, was directly appointed as the chief scientist of Meta's Super - Intelligence Laboratory!
A talent war around the "AI brain" has already reached a white - hot stage!
Reinforcement learning: The ancient technique that detonates the intelligence revolution
Behind OpenAI's reasoning revolution is an ancient technology called reinforcement learning (RL) that is rejuvenating.
It's like a strict coach, constantly rewarding and punishing the AI's choices in a simulated environment, thus teaching the AI what is "correct."
This technology is not new.
As early as 2016, AlphaGo of Google DeepMind used it to defeat the world Go champion and became famous worldwide.
At that time, a veteran employee of OpenAI, Andrej Karpathy, had already begun to conceive how to use reinforcement learning (RL) to create an AI agent that could operate a computer proficiently.
However, it took OpenAI several years to turn the ideal into reality.
In 2018, OpenAI launched the groundbreaking large language model GPT series.
Paper link: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Relying on massive data and GPU clusters, it became a genius in text processing and eventually gave birth to ChatGPT.
But its weakness was also fatal - it couldn't even handle basic math.
Until 2023, a groundbreaking breakthrough arrived!
A project codenamed "Q*" (later known as "Strawberry") integrated the large language model, reinforcement learning (RL), and a technology called "computation at test time"!
It gave the model extra thinking time, allowing the AI to repeatedly plan, deduce, and verify before giving an answer.
The "Chain of Thought" (CoT) technology was thus born! The AI's performance in dealing with unheard - of math problems was completely transformed!
"I witnessed the model starting to truly reason," researcher El Kishky said excitedly. "It would find its own mistakes, then backtrack and correct them. It would even show signs of frustration. It felt like reading someone's thoughts!"
These technologies, taken individually, were not original.
But OpenAI's genius lay in combining them in an unprecedented way, directly giving birth to the later ace - o1.
At that moment, OpenAI suddenly realized: Isn't this ability to plan and fact - check the perfect engine to drive an AI agent?
"We solved a problem I had been thinking about for years!" Lightman said. "It was the most exciting moment in my scientific research career!"
Detonating reasoning: A bottom - up gamble
With the AI reasoning model, OpenAI's ambition was completely ignited.
They discovered two new evolutionary paths:
1. Invest more computing power in the later stage of model training!
2. Give the model more thinking time and computing power when answering questions!
"OpenAI has never just thought about the present, but how to infinitely expand its advantages in the future!" Lightman said.
After the breakthrough of the "Strawberry" project in 2023, OpenAI quickly assembled a "AI agent" special task force led by researcher Daniel Selsam.
Their only goal was to push this new ability to the limit!
Initially, there was no strict distinction between the "reasoning model" and the "AI agent" within the company.
The common goal was only one: to create a super AI that could complete complex tasks!
Ultimately, the work of this special task force merged into the more ambitious o1 model project, led by co - founder Ilya Sutskever and other top - notch figures.
To create o1, OpenAI had to stake its most precious resources - top - notch talents and GPUs.
In OpenAI, resources are not allocated based on seniority but on strength.
Researchers must achieve amazing breakthroughs to gain the company's full support.
In OpenAI, all research and innovation originate from the front - line, and it is bottom - up. Lightman explained.
"When we presented the amazing evidence of o1 on the table, the whole company immediately reached a consensus: this is it, go all out!"
Many former employees believe that it was OpenAI's almost paranoid pursuit of artificial general intelligence (AGI) that gave birth to this reasoning revolution.
They were single - minded, not swayed by short - term products, and bet all their chips on building the most powerful AI brain. This kind of all - out gamble is almost impossible for other AI giants.
This decision, looking back now, was extremely far - sighted!
By the end of 2024, many AI giants found that the traditional model of "piling up data and computing power" was yielding less and less return.
And the most exciting pulse in the AI field comes from the progress of "AI reasoning"!
Can AI really "think"? The end of a philosophical debate
Is AI really "reasoning"? Does it really have "thoughts"?