The latest paper from Google DeepMind has just been published in Nature, revealing the most powerful mathematical model for the IMO.
DeepMind's AlphaProof achieved a silver medal score close to the gold medal at the IMO. It combines the intuition of large models, reinforcement learning, and Lean formal proofs to solve multiple difficult problems. Although it still has limitations in speed, generalization, and problem reading, it has initiated a new stage of collaboration between human mathematicians and AI.
Every summer, young mathematical geniuses from around the world gather to participate in the International Mathematical Olympiad (IMO), known as the "World Cup of Mathematics."
The competition consists of six questions to be completed in two days. Each question is worth a maximum of 7 points, with a total score of 42 points. The difficulty is extremely high, and often less than 1% of the contestants can answer all the questions correctly.
The horizontal axis represents the score (out of 7), and the vertical axis represents the number of people.
In recent years, the IMO has also been regarded as one of the ultimate challenges in the field of AI and an ideal stage to test the advanced mathematical reasoning ability of AI.
In 2024, Google's DeepMind team sent a special "contestant" to participate in the IMO - an AI system called AlphaProof.
It scored 28 points, missing the gold medal by only 1 point and achieving a silver medal level.
This is the first time in history that an AI system has achieved a medal - equivalent score in a top - level event like the IMO, marking a new milestone in the machine's ability to tackle difficult mathematical problems.
AlphaProof: A Mathematical Problem - Solving AI Expert Appears
AlphaProof is the latest "mathematical problem - solving AI" system developed by DeepMind, specifically designed to prove complex mathematical propositions.
Put simply, if we consider a math problem as a "maze" to be solved, AlphaProof is a self - taught AI problem - solving expert.
Different from common models like ChatGPT that "think" purely in natural language, AlphaProof takes a unique approach: it reasons in a computer - verifiable formal language, ensuring that each step of the derivation is strictly correct and avoiding fallacious steps that seem like "flashes of inspiration" but are actually baseless.
AlphaProof uses Lean, a popular formal proof language in the field of mathematics, to write proofs.
An example of Lean language
The syntax of Lean is a combination of mathematics and programming languages, allowing each step of the AI's reasoning to be automatically checked and verified, avoiding the fallacies that may occur in conventional language models.
The answers given by AlphaProof are not based on text explanations reviewed by humans but on a rigorous proof that has been verified line by line by a computer.
This way of "hardening" AI thinking into a mechanically verifiable form ensures that AlphaProof has no element of luck when solving even the most difficult problems.
Technical Secret: The Combination of Large Models and Reinforcement Learning
The core secret of AlphaProof's success lies in the ingenious combination of the "smart intuition" of pre - trained large language models and the "diligent practice" of the AlphaZero reinforcement learning algorithm.
Language models are good at learning human problem - solving experiences and patterns from massive amounts of data;
Reinforcement learning allows the AI to continuously improve its strategy by trying and making mistakes, just like a child learning to ride a bike through repeated practice.
The DeepMind team first used the large model to lay a "knowledge" foundation for AlphaProof and then let it practice repeatedly in a simulated mathematical environment to discover problem - solving strategies on its own.
The researchers first collected nearly one million math problems (covering different fields and difficulty levels) and used Google's latest Gemini to automatically translate these problems described in natural language into formal Lean code expressions.
This process is equivalent to creating an unprecedentedly large question bank for AlphaProof - the team obtained approximately 80 million formal mathematical propositions for the AI to practice proving.
With this "ocean of questions," AlphaProof first underwent supervised learning fine - tuning to master basic Lean language proof techniques.
Then, it entered the reinforcement learning stage: just like AlphaGo playing self - games in chess, AlphaProof competes with itself in the Lean proof environment.
Whenever AlphaProof finds a correct proof for a problem and passes the verification, it uses this successful case to immediately strengthen its own model parameters, enabling it to solve more difficult new problems more effectively next time.
This training cycle of learning while practicing continues, and AlphaProof keeps improving in proving millions of problems, gradually mastering the key skills required for high - difficulty problems.
When searching for proofs, AlphaProof does not "brute - force enumerate" blindly.
It uses a strategy similar to Monte Carlo tree search in chess - playing AI, intelligently breaking down complex problems into several sub - goals and attacking them one by one, and flexibly adjusting the search direction.
In some cases, AlphaProof can take just the right step in seemingly infinite possible derivations, showing a "flash of inspiration" like that of a human mathematician.
This is due to both the intuitive guidance provided by the large model and the comprehensive search ability brought about by repeated exploration in reinforcement learning - the combination of the two makes AlphaProof better than any previous AI system at finding solutions in complex mathematical mazes.
Winning a Silver Medal at the Olympiad: A Milestone for AI Problem - Solving
DeepMind's AlphaProof, in collaboration with AlphaGeometry 2, solved 4 out of the 6 competition questions at the 2024 IMO, scoring 28 points (out of 42), reaching the level of a silver - medalist.
This score was only 1 point short of the gold - medal line (29 points), almost touching the threshold for a gold medal.
Among the solved questions, AlphaProof solved 3 questions independently (including 2 algebra questions and 1 number theory question), including the most difficult Question 6 in the entire competition - only 5 out of more than 600 top students got full marks for this question.
The remaining geometry question was completed by the AlphaGeometry 2 model, which specializes in geometry, and the two combinatorial mathematics questions were not solved due to difficulties in formalization and explosive search.
Finally, this AI system got full marks for 4 questions (0 points for the other 2 questions), and the score was exactly at the top of the silver - medal range.
It should be noted that among human contestants, less than 10% can win a gold medal. This year, 58 contestants scored no less than 29 points.
The silver - medal level achieved by AlphaProof is comparable to that of an internationally top - level high - school genius who has received years of training.
This result shocked many experts: the famous mathematician and Fields Medalist Timothy Gowers commented that some of the ingenious constructions given by AlphaProof "far exceed what I thought AI could currently achieve."
AlphaProof's performance at the IMO is of milestone significance.
This is the first time that AI has reached the level of a human medalist in such a high - difficulty mathematical competition, indicating a major leap in AI's mathematical reasoning ability.
In the past, even if large models mastered a large number of textbooks and theorems, they often had difficulty fully solving Olympiad - level challenges, let alone giving strict proofs.
Through formal proof and reinforcement learning, AlphaProof truly enables AI to have the ability to solve open - ended mathematical problems.
The fact that it successfully proved the most difficult questions in the IMO also gives hope: perhaps in the future, AI has the potential to assist humans in tackling unsolved mathematical conjectures.
Limitations and the Future: The Advancement Path of AI Mathematicians
Although AlphaProof is impressive, it still has many limitations at present.
Firstly, problem - solving efficiency is an issue.
Human contestants must complete 3 questions within 4.5 hours, while AlphaProof, although it finally found solutions to 3 questions, took nearly 3 days.
This shows that there is still a lot of room for improvement in the search speed and computational resources of the current AI proof methods.
Secondly, AlphaProof is not omnipotent. The two combinatorial mathematics questions it failed to solve reflect that certain types of problems are still difficult for AI.
This type of question often involves highly unstructured innovative thinking, beyond the scope of what AlphaProof has "seen" in training.
Therefore, how to make AI more general and adaptable to deal with novel and unseen difficult problems is an important challenge in the next step.
Thirdly, currently, AlphaProof requires humans to first translate the questions into formal Lean expressions, and it does not understand natural - language problems on its own.
This means that it cannot read questions independently, nor can it propose new questions or judge which problems are worth researching like human mathematicians.
As He Yanghui of the London Institute of Mathematical Sciences pointed out, AlphaProof can be a powerful tool to assist mathematicians in proving, but it cannot replace humans in discovering and selecting research topics.
He Yanghui
Facing these limitations, the DeepMind team said that they will continue to explore various ways to improve AI's mathematical reasoning ability.
One of the future research and development directions is to enable AI to get rid of the dependence on manual translation, directly read and understand mathematical problems described in natural language, and give formal proofs.
At the same time, for different types of mathematical problems (such as combinatorial mathematics or geometry), more professional strategies may need to be introduced, such as integrating symbolic computing, knowledge bases, or models trained in different fields, so as to comprehensively improve the problem - solving coverage of AI.
Some researchers also envision that in the future, mathematicians can work together with such AI proof assistants:
The AI can quickly verify human conjectures and small lemmas, and even try bold ideas to tackle long - standing difficult problems;
Humans can focus on proposing meaningful questions and overall proof concepts.
It can be predicted that with the continuous improvement of systems like AlphaProof, we are entering a new era of human - machine cooperation in exploring the frontiers of mathematics.
The formal reasoning ability demonstrated by AlphaProof also has implications for AI security and reliability.
Each step of its reasoning can be traced and verified. This "rigorous verification" style may be used to improve future large models, reducing absurd speculations when answering open - ended questions.
As AI becomes more and more powerful, we hope it can be a down - to - earth and rigorous "mathematician."
After this Olympiad experience, AlphaProof shows us the dawn of AI approaching the top - level human level in the field of pure rationality.
Of course, the creativity and insight of top - level human mathematicians are still irreplaceable - at least in proposing questions and macroscopic ideas, AI still has a long way to go.
But there is no doubt that AI is becoming a powerful assistant for humans to explore the unknown in mathematics.
Whether it is humans or AI, on the road to climbing the peak of truth, courage, patience, and respect for the unknown are always needed.
References:
https://www.nature.com/articles/s41586-025-09833-y
https://www.julian.ac/blog/2025/11/13/alphaproof-paper/
This article is from the WeChat public account "New Intelligence Yuan", author: New Intelligence Yuan. Republished by 36Kr with permission.