ChatGPT proved a six-year-old problem, Turing Award winner says: celebrated too early
One of the most scathing evaluations in the academic circle is:
This work is both innovative and excellent.
It's a pity that the good parts are not novel, and the novel parts are not good.
However, Richard Sutton, one of the founders of the field of reinforcement learning, the author of the textbook "Reinforcement Learning", and a Turing Award laureate, aimed this joke at the entire generative AI.
He said: This evaluation applies to most of the AI we are familiar with today.
AI: The good parts are not novel, and the novel parts are not good
Sutton's core assertion is extremely concise, to the point of being cruel.
Generative AI is essentially supervised learning.
The logic of supervised learning is: Show the model a large number of samples created by humans and let it learn to imitate.
The more similar the imitation, the higher the score.
Here comes the problem.
When the model generates content strictly according to the training data, the output quality is high because it is reproducing good things that humans have already verified. But it's not novel. It's just repackaging what humans already know in different permutations and combinations.
When the model tries to deviate from the training data and generate truly novel content, the quality collapses. Because it has no internal mechanism to judge "whether this new thing is good or not". It can only generate, not evaluate.
This is the structural contradiction:
Novelty and quality are like the two ends of a seesaw in the framework of pure supervised learning.
When you press one end down, the other end goes up.
This is not an engineering problem. It can't be solved by piling up data, expanding the model, or adding more GPUs.
Sutton used an extremely eye - catching analogy: "Hallucination" - the most criticized problem of large models - is essentially a by - product of the model's attempt to be "novel".
Our dislike of hallucinations precisely proves one thing: We actually don't want novelty at all. We only want high - quality imitation.
"The good ones are not novel, and the novel ones are not good."
The scathing review of the reviewer in that joke actually accurately describes the inherent limitations of the entire generative AI.
True "discovery" requires a three - part set
Sutton started from first principles and disassembled the "trinity formula" of creativity:
True discovery (Discovery) = Variation + Evaluation + Selective Retention.
Any true creativity and discovery require three steps, and none of them can be missing:
1. Variation generates diverse possibilities. It can be random or based on existing knowledge, but there must be real uncertainty - otherwise, it's not exploration but looking up a table.
2. Evaluation judges which variations are valuable. This requires a clear goal or a standard to identify "good" and "bad".
3. Selective Retention keeps the valuable variations and lets them influence future actions and learning.
These three steps are not Sutton's invention. They are the logic of natural selection, the logic of the scientific method, and the logic of human learning.
The theory of evolution: Random genetic mutation (variation) → Environmental screening (evaluation) → Survival of the fittest (selective retention).
The scientific method: Proposing a hypothesis (variation) → Experimental verification (evaluation) → Publishing a paper (selective retention).
Human learning: Trying different solutions (variation) → Checking right or wrong (evaluation) → Remembering effective methods (selective retention).
Currently, generative AI has only completed the first step of the trinity: There is almost no evaluation, let alone selective retention.
It's like an archer who can shoot arrows randomly, but with eyes blindfolded. After shooting, it neither looks at the target nor adjusts its posture according to the result.
If you ask it to shoot ten thousand arrows, it may hit the target occasionally, but it will never know why it hit.
So, are scientists still useful?
By now, you may be a little anxious: If AI can truly complete the trinity of "discovery" autonomously in the future, will scientists be out of work?
Sutton's own answer is: They cannot be replaced, but their roles need to be completely transformed.
He said in a speech that even an AI capable of independently proving mathematical theorems still needs humans to tell it: Which problems are important.
This is not modesty but a real cognitive boundary.
Mathematician Shiqian Ma, a scholar in the field of optimization at Rice University, said: He used ChatGPT to prove the convergence problem of an algorithm that he had been researching for six years.
There is a sentence in the abstract:
The proof was generated by ChatGPT 5.5 and verified by the author.
https://optimization-online.org/2026/05/convergence-of-bdrs-as-a-matrix-scaling-algorithm/
This algorithm is called BDRS, short for Bregman Douglas - Rachford Splitting, and is used to solve the Optimal Transport problem.
Paper title: Bregman Douglas - Rachford Splitting Method
Preprint address: https://arxiv.org/abs/2509.08739
This was something he and his co - authors designed themselves. What troubled him for six years was the proof of its convergence, that is, the most rigorous mathematical sense of "why it is correct".
The preprint platform arXiv has put the submission on hold since receiving it.
He guessed the reason: There are the three words "ChatGPT" in the abstract, and the platform doesn't know how to handle such papers.
But can humans be replaced by AI?
His answer is: No. He said bluntly:
I don't think AI can creatively propose such an algorithm and claim, "This is an efficient algorithm for optimal transport. Now let me try to prove its convergence."
Without human guidance, AI simply doesn't know which problem to solve.
This sentence exactly corresponds to Sutton's view: The problem itself must be defined by humans.
It took him six years to "ask the right question":
To ask which questions actually requires you to have a very in - depth understanding of the subject.
In this case, I've been working on this problem for six years, so I clearly know where the difficulties lie."
These six years were not a waste but a prerequisite.
It was these six years that made him know where the proof was stuck, why all previous paths failed, which direction given by ChatGPT was worth pursuing, and which was a hallucination.
Moreover, it was not just one prompt but five months. This is the most easily misunderstood point, and he himself has misunderstood it:
From January to May, for a full five months, there were countless conversations, and each prompt was getting closer to the proof.
He summarized very clearly:
The essence of research remains the same, which is still trial and error. What has changed is the speed of each trial and error - In the past, it took weeks to verify a direction, but now you can know whether a path is feasible in a few minutes.
But the contribution of AI is indelible:
Then, the ending is simply amazing:
Back to my paper on the convergence of BDRS, I'm quite sure the proof is correct.
But if you find any errors, I take full responsibility - Please don't blame ChatGPT. It's only 3.5 years old.
The beauty of this sentence lies in its duality: It's a sincere statement of responsibility and an accurate metaphor.
"3.5 years old" describes the real situation of AI at the moment: It has amazing abilities but immature judgment.
After all, humans have never expected a 3.5 - year - old child to make any contributions.
Although you can't give the final signature right of the proof to AI, you can't pretend that AI has made no contributions either.
This is why true scientific discovery will not disappear from human hands.
On the contrary, it will screen humans more cruelly: Only those who can ask good questions deserve to have powerful AI.
In the future, scientists not using AI may be as outdated as astronomers not using