In 12 hours, a 42-year puzzle was solved, and AI has taken a step closer to AGI.
A problem that had puzzled the mathematics community for 42 years has been solved.
It wasn't in a top - tier research institute, nor through a newly published paper, but in a three - day, approximately 12 - hour man - machine dialogue.
On April 28, 2026, Ernest Ryu, a senior researcher at OpenAI, recalled this experience in an OpenAI podcast: He had repeated conversations with ChatGPT, constantly pointing out the model's errors and adjusting the direction, continuously approaching the conclusion. This difficult problem regarding the convergence of a classic optimization algorithm had remained unanswered for 42 years. Until a complete proof emerged, Ryu verified it manually and then had the model review it again, and the results were completely correct.
"Without these tools, it might have taken me three months or even longer."
In the same conversation, Sébastien Bubeck also mentioned that the model has reached the top level in the International Mathematical Olympiad and has begun to provide substantial help in some research - level problems. It can even connect existing results scattered across different fields and find paths that predecessors couldn't.
The boundaries of AI's capabilities are truly being pushed forward. This is a signal worthy of serious attention in the process of AGI.
Section 1 | From Tool to Participant
First, establish a reference system.
At the beginning of 2025, there were things the model couldn't do yet, such as calculating how much each of three campers should pay when there were more than a dozen items in the consumption details; or finding a suitable time for a Zoom meeting among people in three different time zones. These seemingly simple tasks were difficult for the model to complete reliably at that time.
In the same year, an open problem that had puzzled the mathematics community for 42 years was solved in a 12 - hour man - machine dialogue.
This change cannot be explained by the model simply becoming smarter. In fact, about a year and a half ago, Sébastien Bubeck participated in a debate at an academic conference with the theme of whether scaling large - language models could help solve major open mathematical problems. At the beginning of the vote, 80% of the participants thought it was impossible. After the debate, the proportion became 50 - 50. And just eight months later, the model started doing research - level mathematics.
This is no longer a simple question - answering process. The model didn't give the answer all at once, nor did it progress along a stable path. The whole process was iterative: proposing ideas, conducting reasoning, finding loopholes, adjusting the path, asking further questions, and expanding.
This is closer to the real research state.
In the past, even if the model could solve complex problems, in essence, it only stopped at the level of outputting results. Now, it starts to enter the process itself. Research progresses by gradually approaching the goal through repeated attempts, rather than jumping directly from the problem to the answer. Once the model enters this process, its role changes from a tool to a participant.
Meanwhile, when the research team also tried to use the model to handle a batch of long - unsolved mathematical problems, they found that some answers were actually hidden in existing results from different fields but had never been connected. The model finds available clues and establishes connections in the vast knowledge through large - scale retrieval and reasoning. On this basis, new results begin to emerge.
Therefore, this case is not just a successful problem - solving, but also marks that AI has officially entered the real research process.
Section 2 | Longer Thinking, Real Breakthrough
If we simply understand it as the model suddenly becoming stronger, it's easy to draw a wrong conclusion: This is the result of a single - point technological leap. However, Sébastien Bubeck's explanation in the interview is the opposite. No single factor can explain all this. This change is a concentrated manifestation of the superposition of multiple capabilities at the same time.
Among all these capabilities, the most core breakthrough is that the model begins to be able to conduct longer - term reasoning continuously and maintain the coherence of the train of thought in the process.
Why is this crucial? In mathematics and even broader scientific research, the difficulty often doesn't lie in a specific step, but in whether the entire derivation chain can always hold. As long as there is a deviation in one link in the middle, all subsequent content will lose its meaning. This requires continuous checking and correction in the reasoning process for a long time. Just moving forward is not enough. Past models performed well in short - step reasoning, but they were prone to deviate when the reasoning chain was lengthened, making it difficult to continuously advance in complex tasks.
To understand how big this change is, we can go back four years. Around 2022, Google released the Minerva model specifically for mathematics. Sébastien Bubeck recalled that he was so excited that he almost jumped out of his chair. The reason was simply that when given the coordinates of several points on a plane, the model could draw a straight line passing through these points.
A new measurement scale, "AGI time", is gradually taking shape in the technology industry. It no longer measures how smart the model is but how long it can think continuously. From initially only being able to handle simple problems in a few seconds, to being able to maintain reasoning for a few minutes, and now being able to conduct hours or even days of exploration around a difficult problem.
Ernest Ryu made an analogy with Codex in the interview. Codex can handle large code repositories over a long work cycle. By continuously compressing and organizing the conversation records, it can advance complex tasks in continuous interactions. Ryu believes that mathematical research will follow the same path: Mathematical notes are equivalent to code repositories, and the reasoning process is equivalent to a long - cycle work session. The model doesn't need to complete all the derivations in a single conversation. Like human researchers, it can make some progress today, organize it into notes, continue next week, and finally condense months of thinking into a paper.
Looking forward along this logic, if the model can maintain the stability of reasoning over a longer period, such as weeks or even longer, the types of problems it can handle will undergo a qualitative change. Many research tasks that require long - term and repeated deliberation will gradually come within its capabilities.
The so - called closer to AGI doesn't require waiting for a sudden dividing point. A more realistic path is the continuous extension of the thinking time: from short - term reactions, to continuous reasoning, and then to long - term thinking approaching the rhythm of human research.
What determines the boundaries of AI's capabilities is not only what it can do, but also how long this ability can be maintained.
Section 3 | Science Is Being Reorganized
Mathematics is just one of the first disciplines affected by AI's capabilities. What really needs to be focused on is how the way of scientific work will change once this ability becomes widespread.
At the first level, it's about the way of knowledge verification. The Bubeck team has tested a large number of published mathematical papers and found that there are a considerable number of errors in them, some are minor mistakes, and some are fundamental loopholes. In the past, it often took several years for a 300 - page proof to be fully verified after publication. During this period, the entire field may have continued to develop based on a wrong conclusion. Now, AI can greatly shorten this verification cycle. The reliability of the existing knowledge system is being re - sorted.
At the second level, it's about the starting point of research. The model begins not only to answer questions but also to pose them. In the interview, it was mentioned that their internal model can already generate high - quality research hypotheses, good enough for human researchers to think that this direction is worth writing a special paper on. When the process of posing questions can also be deeply assisted by AI, the core abilities that researchers really need to retain become judging which questions are worth pursuing, making choices at critical nodes, and identifying which direction has real breakthrough potential. These are precisely the parts that the model is currently most difficult to replace.
Of course, this reconstruction doesn't automatically lead to an improvement in research results. Over - relying on the model will instead make researchers only stay at the surface of understanding the results and lose the ability to conduct in - depth derivations. The more powerful the tool is, the higher the requirements for the user's judgment.
In the long run, scientific research is undergoing a major adjustment of the division of labor. AI can take on more and more of the repetitive derivation work, while the requirements for judging what to do and which direction to take are actually increasing. The core value of researchers is shifting from execution to decision - making.
When computers first appeared, some predicted that there would be no more difficult problems in the mathematics community. As a result, computing itself opened up brand - new research fields, and the number of difficult problems only increased. The same logic applies today: the more powerful the tool is, the more questions are worth asking, and the scarcer the people who can pose good questions become. This change won't be limited to the field of mathematics. Materials science, biological research, and all disciplines that require a large amount of reasoning and verification, as long as the problems have a clear structure, AI will gradually get involved.
What AI accelerates is the pace of science. What remains unchanged is that science needs humans to define the direction.
Conclusion | The Direction Still Lies with Humans
A problem that had no answer for 42 years had a result in 12 hours. This is not the end but a new starting point.
What really narrows the gap is that the model can now maintain coherent reasoning for a longer time and enter the research process itself. This is more worthy of attention than any single - time breakthrough.
The tool has replaced more and more execution work. The only remaining questions are: Can you understand the results? Do you have the ability to pose the next better question?
Thinking is accelerated, but the choice still lies with humans.
Original article link:
https://www.youtube.com/watch?v=9 - TVwv6wtGQ&t = 846s
Source: Official media/Online news
This article is from the WeChat official account "AI Deep Researcher". Author: AI Deep Researcher. Editor: Shen Si. Republished by 36Kr with authorization.