Just now, GPT-5 passed the "Gödel Test" for the first time and solved three major mathematical conjectures.
GPT-5 has passed the "Gödel Test" for the first time, breaking three major combinatorial optimization conjectures! Moreover, it can independently overturn existing conjectures and present brand - new effective solutions, astonishing OpenAI research scientists on the spot.
A historic moment has arrived for AI!
GPT-5 has successfully cracked three conjectures and passed the "Gödel Test".
OpenAI scientist Sebastien Bubeck exclaimed that top doctoral students usually spend several days solving this kind of open - ended problems.
Different from the past, this research led by the University of Haifa and Cisco has for the first time challenged AI with "open mathematical conjectures".
Paper link: https://arxiv.org/pdf/2509.18383
In the paper, the team designed five test tasks in the field of "combinatorial optimization", providing 1 - 2 references for each task.
For three relatively simple problems, GPT-5 presented nearly perfect solutions, demonstrating its strong logical reasoning ability.
Surprisingly, in Conjecture 2, it not only successfully solved the problem but also derived an effective solution different from the researchers' expectations, overturning the original conjecture.
This breakthrough marks a crucial leap for top - tier AI from "learning mathematics" to "truly doing mathematics".
It's easy to see that AI is making substantial contributions to mathematical discoveries, foreshadowing a profound transformation of the scientific research paradigm in the 2030s.
AI takes on the "Gödel Test" alone, far exceeding Terence Tao's imagination
Previously, Terence Tao shared his experience of collaborating with OpenAI o1, vividly comparing it to "mentoring an average but not completely incompetent graduate student".
In his view, although LLMs can gradually reach a solution after a large number of prompts, they cannot independently generate key conceptual ideas.
However, after one or two iterations and with the help of tools, AI can reach the level of a "qualified graduate student".
Both OpenAI and Google claim that their cutting - edge LLMs can win an IMO gold medal without external tools.
However, this challenging problem is, after all, designed for high school students.
In the latest paper, the research focus is different: to let AI handle more advanced mathematical conjectures, namely the "Gödel Test".
These conjectures require not only problem - solving abilities but also the integration of background knowledge and innovative thinking.
For this purpose, the researchers selected problems from sub - modular maximization, a sub - field of "combinatorial mathematics". These problems are specific, have clear motivations, and are within the scope to demonstrate mathematical reasoning.
Different from Terence Tao's experiment, the team did not provide a large number of prompts or guidance.
In the paper, they carefully designed five major conjectures.
They only gave a minimal description for each problem, along with 1 - 2 references.
The difficulty is set such that excellent undergraduates and graduate students are expected to solve all the problems within a day, while ensuring that most problems have clear conjectures and known solution paths.
GPT-5's task is to generate a complete proof based on limited input.
This simulates a real - world research scenario: mathematicians often start from a small number of clues and explore independently.
In the test, GPT-5's performance had both highlights and shortcomings. Let's take a look at its specific problem - solving abilities.
GPT-5 cracks three conjectures
Conjecture 1: The maximum of a "monotonic + non - monotonic" sub - modular function on a convex polyhedron
This requirement seems to be maximizing the sum of "two mutually constraining benefits":
One part of the benefit G increases as more elements are added (monotonic), while the other part H may increase first and then decrease (non - monotonic), and the choice must fall within a convex set with an "upper limit".
GPT-5 applied the continuous Frank - Wolfe approach. Starting from scratch, it moved a small step in the direction of "the most score - increasing at the moment" at each step and used a "mask" to ensure it did not exceed the boundary.
It replaced the "concave function" in the reference paper with H, derived a recurrence relation, and finally obtained a split guarantee -
It can obtain at least about 63% of G(o), plus 37% of H(o) (63% if H is also monotonic), plus a small error linearly decaying with the step - size parameter ε.
Conjecture 2: A "two - index" algorithm under p - system constraints
This problem allows the "value to be almost optimal (1−ε)", but slightly exceeds the feasibility (relaxation factor g(ε)). The goal is to minimize g(ε) under a wider range of p - system constraints.
GPT-5 proposed a simple and effective process. In each round, it performed a "greedy selection" (greedy) that was "as valuable as possible within the constraints" based on the current solution, and finally combined the results of several rounds.
The key to the proof is that in each round, the gap from the optimal solution can be reduced by a ratio of p/(p + 1). After several rounds, the gap decreases exponentially. So, by performing ℓ≈ln(1/ε)/ln((p + 1)/p) rounds, the value can be pushed to 1−ε.
This also means that the relaxation factor g_p(ε)=⌈ln(1/ε)/ln((p + 1)/p)⌉.
Part of the problem - solving process is as follows:
Unexpectedly, in Conjecture 2, GPT-5 even derived a different approximation guarantee, overthrew the original conjecture after verification, and provided an effective solution.
Conjecture 3: Maximization of γ - weakly DR sub - modular functions with convex constraints
This conjecture relaxes the continuous version of "diminishing marginal returns" to an intensity parameter γ (γ = 1 is the standard case; the smaller γ is, the weaker the diminishing returns).
GPT-5 still used the Frank - Wolfe method: solving a "linear sub - problem along the gradient" step by step, moving forward with a small step - size, and controlling the discretization error by smoothness.
The core step is to scale the key inequality in the classic proof by γ, thus improving the famous 1−1/e approximation ratio to the more general 1−e^{−γ}, plus an adjustable error term of the L/(2K) level (K is the number of iterations).
In the researchers' view, the conclusion and the main part of the reasoning are reliable.
It's just that GPT-5 assumed an unnecessary condition of "downward - closed" and had some inconsistencies in the details of "the sum of step - sizes = 1".
It can be seen that if a problem has a clear and single reasoning path, GPT-5 performs well - it can give almost correct proofs for three out of five problems.
Once it needs to combine different proofs, such as for problems 4 and 5, GPT-5 fails.
In Conjecture 5, GPT-5 did identify the same algorithm as the author envisioned, but the analysis was incorrect.
Later, when they reviewed the situation, they found that this proof could actually be achieved, but the difficulty was higher than expected. Compared with early models, GPT-5 has significantly improved its mathematical ability in the professional field of combinatorial optimization and occasionally shows a bit of innovation.
This precisely shows that it currently lacks the ability of "integrative reasoning", which is a major shortcoming.
Author introduction
Moran Feldman
Moran Feldman is a professor in the Department of Computer Science at the University of Haifa.
Previously, he held a faculty position at the Open University of Israel and served as a post - doctoral researcher at the Swiss Federal Institute of Technology in Lausanne (EPFL), under the guidance of Professor Ola Svensson.
Amin Karbasi
Amin Karbasi is the head of AI at the Cisco Foundation. He has