"Die menschlichen Prüfungsaufgaben werden von KI überflutet", Gespräch zwischen Terence Tao und Mark Chen: Große Modelle sind keine "Mathematik-Schwächen" mehr.
Recently, Fields Medalist Terence Tao and Mark Chen, the lead researcher at OpenAI, participated in a "campfire chat" at a symposium of the Institute for Pure and Applied Mathematics (IPAM). Both discussed the leaps and bounds AI has made in the last twelve months and how it will fundamentally change the way mathematics is researched.
In the conversation, Terence Tao and Mark Chen discussed the crucial changes in AI's capabilities: A year ago, Terence Tao rated GPT's performance in mathematics as "like a very inefficient graduate student." Today, AI has already won gold medals in the IMO competitions, and the benchmarks created by humans are quickly being "outbid."
One can say that in mathematics, the large model has essentially shed the label of the "school dropout."
In Terence Tao's view, mathematics is an area where experiments and trial - and - error are inexpensive. "If you're an engineer and you come up with a faulty bridge design, that's an expensive mistake; if you're a surgeon and you operate on the wrong organ, that's an expensive mistake."
Since AI has begun to quickly solve some long - neglected Erdős problems, the mathematical community is also rethinking the division of labor in research, the collaboration between humans and machines, and the changes in the education system. If calculations and verifications can be outsourced to machines, the form of mathematical research could gradually change.
Here is the summary of the "campfire chat" between Terence Tao and Mark Chen:
Question: What did you say about AI's performance in mathematics a year ago? What has changed in the last twelve months?
Terence Tao: The changes are very significant. AI itself has made progress, but more importantly, it has been integrated into our daily research. Current in - depth research and literature reviews are already far more efficient than traditional methods, and code generation is also quite reliable.
Terence Tao, Fields Medalist, a Chinese - American mathematician and professor at the University of California, Los Angeles
As a pure mathematician, I'm not overly reliant on AI, but it has actually changed my problem - solving habits. For example, if I want to verify a conjecture, I first let AI try; if I know how to prove a "lemma" but it's too tedious to write it down, I give it to AI. But when it comes to the most difficult problems, I still can't have in - depth discussions with it, at least not in the current phase.
On a larger scale, the mathematical community is beginning to recognize that AI is already a reality and that we need to adjust our way of working. The tedious tasks that we used to delegate to graduate students can now be given to AI, which opens up the possibility for large - scale research projects that we couldn't imagine before.
So, although the use of AI to support existing processes is still a bit clumsy, I see more of a possibility to develop a new workflow for AI. It's like after the invention of the car: A city can't just plan roads for horse riders. We're now in this transition phase.
Mark Chen: I don't blame Terence for saying a year ago that AI was like an inefficient graduate student. That was actually the case back then. Internally, we have an indicator called the "autonomous working period" to measure how long a model can work continuously without crashing. Last year, it was still in the "minute range," it had hallucinations all the time and got confused with too many tasks.
Mark Chen, lead researcher at OpenAI
But for many people, last year was a turning point: The errors have decreased, and one can let AI work longer with a clear conscience. This allows us to break free from the earlier "scaffolding" - style of support and start to tackle larger problems and collaborate with the model.
For example: A year ago, AI would probably have won a bronze medal in the IMO competitions; last summer, it has already won gold medals in all high - school mathematics and programming competitions. The benchmarks created by humans will soon be completed by us. So people are starting to turn their attention to the field of mathematical research, and that's our goal.
OpenAI doesn't just want to solve some Olympiad problems. The real ambition is to push the scientific frontiers forward. Now the time - span of the tasks is already very long, and we can actually start. Although we haven't fully achieved it yet, the trend is already very clear.
Question: Does solving the Erdős problems represent the current ability of AI?
Terence Tao: I've always been observing the collection of Erdős problems. The difficulty of these problems varies greatly. Some have occupied the scientific community for decades, and I've only made minor progress in my publications myself. In these difficult problems, AI really can't help.
But Erdős posed thousands of problems, many of which are "long - tail problems" that have been neglected for a long time and have almost no follow - up research. This is exactly where AI has made a breakthrough: About twenty to thirty such problems have been solved by AI with minimal human supervision, and usually, they can be verified with other AI tools. This shows that we've developed a workflow that doesn't let us be overwhelmed by AI's wrong answers.
This process makes me recognize a possible cultural change: Mathematicians shouldn't just focus on the few extremely difficult problems but should start publishing lists of problems for which they really want answers. For example, if you create a list of a hundred problems, AI might solve 10%, and a high - school student might solve another 5%. In this way, you can advance mathematical research through the community.
Question: Will mathematics become like biology, a large - team collaboration?
Mark Chen: The trend is very clear. In other scientific fields, the number of co - authors of publications has been increasing exponentially over time. Mathematics and theoretical physics are the exceptions. But now we're seeing changes. Projects like "First Proof" and the Erdős problems are finding really worthwhile problems to solve through in - depth interaction with the community.
We've also made similar attempts in physics. We've invited leading physicists to create a list of important problems that can be processed by AI. This in turn helps us improve the model. Our goal is to build a platform that enables global scientists to accelerate their research and strengthen the entire mathematical community.
We can already observe young people in their early twenties solving problems independently with the model. Although there haven't been any major breakthroughs yet, it's already enough to change the entire research ecosystem.
Question: Can AI enable the division of labor in mathematical research?
Terence Tao: This is exactly where AI has the greatest potential. Traditionally, mathematicians have to take on all the steps: problem - posing, strategy development, strategy selection, strategy execution, result verification, and publication. We train everyone to be competent in each step, preferably specialized in a certain area. But we can't have a real division of labor like in the industry, where someone is specifically responsible for the technology and someone else for project management.
Now, with AI and formal verification tools, there's a possibility that mathematical projects can function like modern industry: Everyone specializes in just one step. If in the collaboration, someone can't perform a step, AI can take it over. Of course, AI's ability is currently still uneven, and it can't be fully automated. For example, if you ask AI to generate strategies in large quantities but the verification can't keep up, you'll end up with hundreds of strategies that you can't process. If one day the verification ability is also brought to the same level, there will be a completely new and extremely efficient way to do mathematics.
Mark Chen: I'd like to add something. AI's ability is actually uneven, so the collaboration between humans and machines is very effective. Interestingly, AI is closer to humans in some aspects than you might think. You have to apply a lot of reinforcement learning to prevent it from giving up as easily as a human.
For example, if you give AI a task that's too difficult, the model will think after a few tests: "This is too difficult, I can't do it, I'll just pretend to try." We've seen this with the Erdős problems: If you give AI a task, it first searches the Internet. If it finds that it's an open problem, it immediately gives up. You have to tell it: "Don't search the Internet, solve it yourself, it's not that difficult."
Question: Will the future be a collaboration between humans and many AI agents, or will AI dominate?
Terence Tao: I think it's both yes and no. The kind of mathematics we're currently doing might develop in this direction, but at the same time, completely new and currently unimaginable forms of mathematics will also emerge. Mathematics is infinite, there's no upper limit to the difficulty, and some problems are even unsolvable - AI can't mine all the bitcoins. There will always be a frontier. The abilities of humans and the current large language models complement each other perfectly. I'm convinced that the best combination will always be a complex "human + machine" combination, it's just that the form of this combination will change over time.
Question: Does higher intelligence require computing power or algorithms?
Mark Chen: Both are absolutely necessary. The entire research approach of OpenAI is essentially to improve the algorithms so that they can be adapted to the computing power scales we'll have in one or two years. The algorithms we know are very fundamental and can also be expanded, but it requires a lot of engineering and fine - tuning to ensure that they're really suitable for the next order of magnitude.
The good thing is that it's a multi - dimensional problem. We can expand the model size, build a larger "brain" and insert more knowledge. The more extensive and in - depth the knowledge is, the easier it is to make connections and leaps. We can also expand the inference dimension so that the model can link the knowledge together and gain new insights. We can also make the model generate new knowledge for itself and strengthen its abilities in certain areas. All these dimensions together drive the model towards more autonomous and long - term tasks.
Question: Does the "First Proof" project represent the future form of mathematics?
Terence Tao: It will be a point in the future mathematical landscape. The "First Proof" project is a very interesting experiment. The proofs generated by AI are of good quality, but we've also found a clear "verification bottleneck." We've generated a lot of proofs, some are very bad, some are good, and some are similar to those in the literature. But currently, there's no effective method to accurately evaluate how new and interesting each proof is.
To effectively use the new abilities of AI, we need to design challenges that are easy to verify. To some extent, it depends on how much AI you can use and how much automation you can achieve, and how strong your verification ability is. The two are directly proportional. So the progress will first appear in areas that are easy to formalize, like combinatorics, or in numerical problems where you can easily verify the answer once you've found it.
But in other mathematical areas, it's different. For example, if you have to find a new theory, a new conjecture, or a new problem - solving strategy, the verification is much more difficult. If AI generates hundreds of strategies, in the end, you can only rely on human experts to evaluate them. That's a bottleneck.
Question: What happens if the goals aren't set correctly?
Terence Tao: This is a very subtle problem. AI is almost too good at literally executing the goals. If you ask it to solve a problem and find a proof for a theorem, AI might one day just give an answer directly. But what you really want is the process in which humans strive: trying, failing, finding counter - examples, researching in the literature, and exchanging intermediate results. Those are the real values in solving a problem. If you define the goals too narrowly, you risk losing all these values. So we need to set the goals more carefully and preserve the randomness and the space for exploration in the research process.
Mark Chen: This brings me to an interesting thought experiment: You can train a model that only masters the knowledge up to a certain point in time, and then imagine what it would be like to do a "First Proof" at that time. Now we have the hindsight, we know which techniques are worth it.