HomeArticle

The "God of Cards" switches to Anthropic, taking on the role of working on the "most dangerous AI".

36氪的朋友们2026-05-20 15:21
Defeat magic with magic?

On May 19th, local time in the United States, Andrej Karpathy, co - founder of OpenAI and former head of AI at Tesla, announced his joining of Anthropic.

"I've joined Anthropic," Karpathy wrote on the X platform. "I believe the next few years will be the most decisive stage in the frontier development of large language models. I'm really looking forward to joining this team and getting back to the front line of R & D."

According to foreign media reports, Karpathy will form a new team under the overall planning of Nick Joseph, the head of the pre - training team. The core mission is to use Claude to accelerate pre - training research.

In other words, his job is to let AI optimize the AI training process itself.

This direction has a long - standing formal name in the field of AI security: Recursive Self - Improvement (RSI). Its core logic is that the AI system achieves iterative leaps in capabilities by continuously optimizing its own training process.

For decades, RSI has mostly existed in academic papers and thought experiments. However, Jack Clark, co - founder of Anthropic, published a long article on May 4th predicting that the probability of AI achieving recursive self - improvement by the end of 2028 is about 60%. On May 13th, the new company Recursive Superintelligence (abbreviated as Recursive SI) founded by Tian Yuandong, former research director of Meta FAIR, was officially announced, and its core direction is also recursive self - improvement.

As computing power, data, and model capabilities simultaneously cross the critical point, RSI is being put into practice by top AI laboratories and has become a real engineering project.

01 Why Karpathy?

Karpathy's career trajectory itself explains why he is the right candidate for this position.

He was one of the earliest research scientists at OpenAI, focusing on deep learning and computer vision from 2015 to 2017. In 2017, he was recruited by Elon Musk to Tesla as the head of AI, leading the Autopilot vision team and pushing neural networks from papers to millions of mass - produced vehicles.

During his five - year tenure at Tesla, he led the construction of a closed - loop system called the "data engine", which is essentially an engineered "model self - improvement" pipeline, except that the improvement target is the perception model rather than the language model. He left Tesla in 2022, briefly returned to OpenAI in 2023, left again after staying there for about a year, and founded the AI education company Eureka Labs in 2024.

Pre - training is the most expensive, most computing - power - dependent, and most engineering - experience - dependent part of the large - model pipeline. TechCrunch evaluated Karpathy as "one of the few researchers who can straddle both LLM theory and large - scale training practice."

02 RSI Moves from Papers to Engineering

Karpathy's joining time is not accidental. Two weeks ago, Jack Clark, co - founder of Anthropic, gave a detailed deduction in the 455th issue of his newsletter "Import AI".

He wrote that after spending several weeks reading hundreds of public data sources, his judgment was that the probability of recursive self - improvement occurring by the end of 2028 is 60%.

Clark's argument is based on a set of verifiable benchmark trends.

On the SWE - Bench, which tests whether AI can solve real GitHub problems, the best score increased from about 2% of Claude 2 at the end of 2023 to 93.9% of Claude Mythos Preview; the "time span for AI to reliably complete tasks" measured by METR increased from about 30 seconds of GPT - 3.5 in 2022 to about 12 hours of Opus 4.6 in 2026; on the CORE - Bench, a benchmark for testing AI to reproduce academic papers, the highest score was only 21.5% when it was launched in September 2024, and by December 2025, it was "solved" by Opus 4.5 with 95.5%.

On an internal benchmark at Anthropic for model - optimized small - language - model training, the acceleration factor increased from 2.9 times of Opus 4 in May 2025 to 52 times of Claude Mythos Preview in April 2026, while it takes human researchers 4 to 8 hours to achieve a 4 - fold acceleration on the same task.

Clark's argument is that "99% of the hard work" in AI R & D, including data cleaning, experiment running, parameter search, and kernel optimization, has fallen within the capabilities of current models. Even if AI temporarily lacks the creativity to disrupt paradigms, just the automated engineering part is enough to significantly accelerate the iteration.

On May 7th, Anthropic officially released "The Research Outline of The Anthropic Institute", listing "AI for AI R & D" as one of the four major research directions, clearly proposing to build telemetry to measure the acceleration of AI R & D and use it as an early warning signal for RSI; at the same time, exploring what intervention nodes exist if the "intelligence explosion" is approaching and which entities, such as the government, companies, or others, should exercise the intervention power.

Clark told Axios, "My prediction is that by the end of 2028, it's more likely to have an AI system where you can tell it 'go and make a better version of yourself' and it will do it completely autonomously."

The engineering foreshadowing was earlier.

On April 14th, 2026, the Anthropic Fellows project publicly announced an experiment: testing whether Claude Opus 4.6 can autonomously advance on the key alignment research issue of "weak - to - strong supervision", including task decomposition, hypothesis generation, evaluation design, and iterative optimization. Let the AI agent take on a research process as a whole.

03 One of the Most Concerned Directions in the Entire AI Circle

The race is happening on multiple fronts simultaneously.

In addition to what was mentioned at the beginning of the article, Recursive Superintelligence, founded by eight founders including Tian Yuandong, former research director of Meta FAIR, officially emerged.

Earlier signs are also visible. Jack Clark mentioned in Import AI that OpenAI's internal goal is to "build an automated AI research intern by September 2026", and DeepMind is more cautious but also said that "automation of alignment research should be promoted when feasible". From the internal goals of large companies to independent startup projects, RSI has become a common strategic direction for frontier laboratories.

There is an unavoidable paradox here.

Anthropic's founding narrative is based on "AI safety first". And RSI is exactly one of the capabilities that the AI safety community has long been most worried about.

Pedro Domingos, a machine - learning professor at the University of Washington, responded to Clark's 60% probability judgment, saying, "Since the birth of the LISP language in the 1950s, AI has had the ability to self - construct. The real question is whether this process can bring increasing returns - there is currently no evidence to support this."

The core question of critics is not whether RSI is "possible", but whether it can generate exponential marginal returns. If the efficiency of each generation of AI self - optimization only improves linearly or even decreases, then the influence of this route will be compressed within a controllable range.

AI safety researcher Eliezer Yudkowsky's response to Clark's 60% judgment was short and terrifying: "Then you'll die with the rest of us."

Clark himself did not avoid this in his article. In Import AI, he gave a set of calculations: if the accuracy of today's alignment technology is 99.9%, it will drop to about 95% after 50 generations of iteration and to about 60% after 500 generations - a compound - interest drift similar to gene mutation. There is currently no answer as to whether alignment can be reliably passed as a constraint to each successive generation of models in the cycle of AI participating in its own training.

Anthropic's answer seems to be that "those who understand the risks best are the most suitable to do this" - advancing both capability research and alignment research simultaneously and outpacing the point of loss of control with the engineering rhythm. Whether this answer holds up needs to be verified by the data subsequently released by Karpathy's team and the Anthropic Institute.

Anthropic made a relatively rare commitment in the research outline: to publicly release "how our work is accelerated by new AI tools" and "data related to the potential recursive self - improvement of AI systems". Whether it can be fulfilled will be a key measure to judge whether Anthropic's bet on the RSI route is an engineering project or a positioning strategy.

Currently, with Karpathy joining Anthropic, it is more obvious that the next - stage competition focus of the AI industry is shifting from "training larger models with more computing power" to "letting AI participate in its own training process".

However, this may be a path with great potential but also great danger.

This article is from the WeChat official account "Tencent Technology", author: Worth Paying Attention To. It is published by 36Kr with authorization.