AI creates AI on its own, with a 60% probability. Before the end of 2028, co-founders of Anthropic can't sit still.
AI systems may soon be able to build themselves!
The person who said this is Jack Clark, co-founder of Anthropic.
On May 4th, he posted on X: "I think there's a 60% chance that recursive self-improvement (RSI) will occur before the end of 2028."
In addition to being the co-founder of Anthropic, Clark is also the founder and main writer of "Import AI", and has been tracking the progress of AI capabilities for a long time.
When he made this post, he also published a complete analysis article on "Import AI".
https://importai.substack.com/p/import-ai-455-automating-ai-research
This is a big deal. I don't know how to make sense of it. It's an idea I'm reluctant to accept: its impact is so huge that it makes me feel insignificant, and I'm not sure if society is ready for the changes brought about by automated AI research.
Clark wrote in the article: If this day comes, humanity will cross a "Rubicon" and enter an almost unpredictable future.
He doesn't think it will happen in 2026, but he predicts that within a year or two, there may be a proof-of-concept on non-cutting-edge models: a model that can train its own successor end-to-end.
The basis for Clark's conclusion mainly comes from public information: papers on arXiv, bioRxiv, and NBER, along with his continuous observation of the products of major cutting-edge laboratories. Clark pieced together a panoramic view of AI progress from this.
In his view, all the components for the engineering production of AI are basically in place today. The remaining question is: when will the model accumulate enough creativity to start driving the advancement of the cutting edge like a human researcher?
Four years, from 30 seconds to 12 hours
Clark's core argument is a set of capability progress curves.
Let's first look at the timeline chart of METR.
https://metr.org/time-horizons/
METR is an institution focused on AI capability assessment. They track: for an AI system to independently complete a task with a 50% success rate, how much time it would take for a skilled person to do the same task.
In 2022, the figure for GPT - 3.5 was 30 seconds;
In 2023, GPT - 4 pushed this figure to 4 minutes;
In 2024, o1 pushed it to 40 minutes;
In 2025, GPT - 5.2 (high - end version) reached 6 hours;
In 2026, Claude Opus 4.6 has reached 12 hours.
In four years, from 30 seconds to 12 hours, it has increased by 1440 times!
AI capability researcher Ajeya Cotra believes that before the end of 2026, this figure is expected to exceed 100 hours.
If it reaches a time span of 100 hours, it will be able to cover many multi - day software/research assistance tasks.
Programming ability is also taking off.
SWE - Bench measures the ability of AI to solve real GitHub engineering problems. At the end of 2023, Claude 2 scored 2%. By this year, Claude Mythos Preview has reached 93.9%, basically breaking through this benchmark.
CORE - Bench measures another thing: given an AI a paper and the corresponding code repository, and let it independently reproduce the experimental results. This is one of the most basic daily tasks of AI researchers.
When the test was launched in September 2024, the best score was 21.5%. In December 2025, Opus 4.5 under Claude Code scaffold had a verified accuracy of 77.78%, and after manual verification, it was 95.5%. The project party said that CORE - Bench has been solved.
https://hal.cs.princeton.edu/corebench_hard
In 15 months, from 21.5% to 95.5%.
MLE - Bench measures the ability of AI to independently participate in Kaggle competitions, covering 75 real competition projects.
When it was released in October 2024, the highest score was 16.9%. By February 2026, the combination of Gemini 3 and search tools had reached 64.4%.
https://github.com/openai/mle-bench
There is also a test within Anthropic: let the model optimize the training code of a small language model that only uses the CPU as fast as possible, with the speed of the unoptimized version as the benchmark.
In May 2025, Claude Opus 4: 2.9 times;
In November 2025, Opus 4.5: 16.5 times;
In February 2026, Opus 4.6: 30 times;
In April 2026, Claude Mythos Preview: 52 times.
In less than a year, it has increased from 2.9 times to 52 times.
This is the progress speed of AI in optimizing AI training code.
AI is almost done with 99% of engineering work
Here is a key question: In AI research, how much is pure engineering and how much is real creativity?
Clark presented a framework and quoted Edison's words: "Genius is 1% inspiration and 99% perspiration."
He believes that the same is true for AI research.
A typical AI research cycle is like this: take an existing system, scale it up in a certain dimension, observe where problems start to occur, fix the engineering problems, and then scale up again.
In this process, most of the work is data cleaning, running experiments, adjusting parameters, reading papers, and reproducing results. These are all "perspiration", not "inspiration".
Occasionally, there are inventions that truly change the paradigm, such as the Transformer architecture and the Mixture of Experts (MoE) model. But that's 1%, and this 1% is becoming less and less of a bottleneck because the 99% of engineering work is being quickly taken over by AI.
Clark listed several signs:
AI can already manage other AIs. In tools like Claude Code and OpenCode, a single AI can act as a "project manager", distribute tasks to multiple sub - AIs for parallel processing, and then aggregate the results.
This is not fundamentally different from the organization mode of a human research team.
PostTrainBench tested one thing: Can an AI fine - tune an open - source small model by itself to improve its performance on a certain task?
This work is usually done by experienced researchers in cutting - edge laboratories.
As of March 2026, the AI system can achieve about half of the effect of human researchers on this task, with an improvement range of about 25% to 28%, while the human baseline is 51%.
https://posttrainbench.com/
There is also a proof - of - concept of "automated alignment research" within Anthropic: let a group of AI agents independently tackle AI safety research problems.
The result is that the solutions given by AI exceed the baseline of Anthropic's human researchers.
https://www.anthropic.com/research/automated-alignment-researchers
Based on these pieces of evidence, Clark's judgment is that AI can already automate most of AI engineering today. It's not entirely clear how much of AI research can be automated, but the signs are already very obvious.
Doubts have also emerged
After Clark's post, some doubts emerged in the industry.
Pedro Domingos, a machine learning professor at the University of Washington and the author of "The Master Algorithm", replied: "Since the invention of LISP in the 1950s, AI has been able to build itself. The question is whether this process will bring increasing returns or decreasing returns - and there is currently no evidence to support the former."
Recursive self - improvement sounds very sci - fi, but being able to cycle doesn't mean the cycle is profitable. If the efficiency of each generation of AI in optimizing itself only has marginal improvement rather than exponential amplification, the scope of influence of this thing will be very limited.
Some people also questioned the conceptual boundaries. "Is there an authoritative definition of RSI?" asked a researcher named Dan Brickley.
Another more pointed observation came from the account @crepesupreme:
30% in 2027, 60% in 2028. A 30 - percentage - point jump in probability within a year means there is a discontinuous capability event between 2027 and 2028. What is that specific event?
Clark responded to this implied question in the newsletter article: he believes that AI research still needs some creative breakthroughs to truly enter the "self - research and development" cycle: AI has not shown transformative performance in this area yet. This is why he only gives a 30% probability for 2027; if this gap is filled before the end of 2028, the probability will rise to 60%.
But he also admits that he is predicting probabilities, not exact time points.
Some people also asked him: "You work at Anthropic. Why do you dig through public data? Why not just go downstairs and ask the researchers?"
Clark's answer is: he uses public data because public data is credible. What he wants is not an internal judgment, but a conclusion that anyone can independently verify.
The window is still open, but it's narrowing
Clark wrote in the newsletter article: Why doesn't he give a higher probability for 2027?
Because he believes that AI research still has some requirements for creative intuition, and AI currently only has "tempting early signs" in this area, without a systematic breakthrough.
He listed two signs: one is that the Gemini model participated in solving Erdős math problems and solved one problem out of 700 that mathematicians considered to have a certain degree of originality.
The other is that institutions such as Stanford and UBC collaborated with Google DeepMind, and AI played a "very substantial role" in discovering new mathematical proofs.
These results may be some early signs on the timeline of AI capability evolution.
Clark estimates that if the situation he described does not occur by the end of 2028, it means that there is a fundamental capability ceiling in the current technological path, which requires human creativity to break through.
More importantly, there are questions after "if it occurs".
When Anthropic announced the establishment of The Anthropic Institute in March 2026, the official statement said:
If the recursive self - improvement of AI systems really starts to happen, then who in the world should be informed, and how should these systems be governed?
https://www.anthropic.com/news/the-anthropic-institute
Even Anthropic itself doesn't have a complete answer to this question.
Clark gave a more technical concern in the newsletter article: if today's alignment technology has an accuracy of 99.9