HomeArticle

The recursive AI warned by Anthropic, Tian Yuandong's new company has just taken its "first step"

机器之心2026-06-12 11:54
A system that can autonomously advance the AI research loop and has set new state-of-the-art results on three benchmark tests.

A few days ago, Anthropic published an article titled "When AI Builds Itself", which quickly sparked extensive discussions. The article revealed a set of eye - catching internal data: As of May 2026, over 80% of the code in Anthropic's codebase had been written by Claude, and the daily code merge volume of engineers was 8 times that of 2024. In an internal test, Claude increased the running speed of a training code by about 52 times compared to the benchmark, while an experienced human researcher usually takes 4 to 8 hours to achieve a 4 - fold acceleration.

Anthropic points this trajectory towards a deeper destination: "Recursive self - improvement" — an AI system autonomously designs, builds, and trains its subsequent versions, and humans no longer drive every step. Notably, the company also calls for industry coordination to have the option to suspend or even temporarily halt the development of cutting - edge AI when the moment of recursive self - improvement arrives. And Anthropic is already doing so: restricting the use of the latest Claude Fable 5 in the research and development of cutting - edge AI.

Now, Recursive Superintelligence has announced that it has taken the first step towards automated AI research.

This new company co - founded by Tian Yuandong has only been out of stealth mode for a month, and now it has released its first public technical achievement. They have built an open - source automated knowledge discovery system and achieved SOTA results in three benchmark tests. Simply put, they have successfully made AI run experiments for you.

https://x.com/tydsh/status/2065062838255649082

First - step achievement: Let AI run experiments for you

Recursive's first public technical achievement is named "First Steps Toward Automated AI Research".

Tweet: https://x.com/Recursive_SI/status/2064980090702962699

Repository address: https://github.com/recursive-org/first-steps-toward-automated-ai-research

Blog address: https://www.recursive.com/articles/first-steps-toward-automated-ai-research

In a nutshell, the core of this work is: Built a system that can autonomously advance the AI research cycle and refreshed the best results in three benchmark tests.

Before formally dissecting the achievement, it is necessary to understand the design logic of this system.

The traditional AI research process is a closed - loop that highly depends on humans: "propose ideas - write code - run experiments - analyze results - propose new ideas". Its efficiency bottleneck lies not in computing power but in humans. There are only a handful of researchers in the world who can design cutting - edge training processes, and they need to be highly involved in each round of experimental iteration.

Recursive's system attempts to automate this closed - loop.

It works in the following way: For a clear optimization goal, the system automatically proposes experimental ideas, implements code, runs verification, learns from it, and then decides how to search next. Multiple research lines can be advanced in parallel, effective discoveries can be reused across tasks, and a detection mechanism for reward hacking is embedded in the entire cycle to prevent the system from "taking shortcuts" to boost evaluation indicators without actually improving anything.

This is not a dedicated tool for fine - tuning a single problem but a general research automation framework across domains. Recursive uses three significantly different test scenarios to prove this.

Three battlefields, three new records

Scenario 1: Small - model training with a fixed computing budget (NanoChat Autoresearch)

The rules of this benchmark test come from the autoresearch project initiated by Andrej Karpathy (the author of GPT - 2 and a former co - founder of OpenAI): On a single GPU, given a fixed training budget of five minutes, train a small - scale language model to the lowest validation loss (measured in BPB, the lower the better) as much as possible.

This scenario is naturally suitable for automated research: short experiment cycles, low indicator variance, and relatively easy detection of cheating behaviors. For this reason, a community project called "autoresearch@home" has been running on this benchmark for a long time — dozens of human researchers and hundreds of AI agents have been collaborating to continuously lower the indicator.

Starting from the same initial code, Recursive's system finally improved the validation BPB from the community's best 0.9372 to 0.9109, a reduction of 0.0263 BPB. In other words, to achieve the same training quality, Recursive's solution only requires 1.3 times less training time than its competitors.

The improvements discovered by the system are not a one - shot victory. It combines multiple changes such as architecture adjustment, auxiliary loss, attention mechanism modification, optimizer behavior, weight decay scheduling, and compiler settings. One of the most critical discoveries is a more sophisticated short - context memory mechanism: In the value path of attention, bigram (adjacent word pairs) and trigram (triples) information are simultaneously embedded through a hash table, and a learnable gating mechanism is used for weighted mixing. Different Transformer layers use different hash functions to reduce the probability of cross - layer repeated collisions.

This technique is conceptually related to works such as DeepSeek Engram, but the system deploys it in a specific variant form not yet seen in the public literature in the fixed - budget scenario.

Scenario 2: Training speed limit race (NanoGPT Speedrun)

If the previous scenario is to "take one more step" on the achievements of an active community, this scenario is much more difficult.

NanoGPT Speedrun is another benchmark initiated by Karpathy and continuously optimized by the community for more than two years: On 8 H100 GPUs, find the shortest time required to train a GPT model to a validation loss of 3.28. Since mid - 2024, the community has compressed the time from about 45 minutes to 79.7 seconds through 83 recorded contributions. Each new solution needs to squeeze out more time on the basis of extremely optimized code, and the difficulty is imaginable.

Starting from the existing optimal solution, Recursive's system further compressed the training time to 77.5 seconds, saving 2.2 seconds. This is comparable to or even better than the improvement achieved by recent human contributors.

The core techniques found by the system this time include:

Attention calculation with FP8 precision. The community solution only uses FP8 (8 - bit floating - point) calculation in the last layer (language model head) of the model, while the system extends FP8 to the matrix operations in the attention layer. FP8 is used for forward propagation to obtain twice the Tensor Core throughput, and BF16 is retained for backpropagation to maintain stability.

Annealing exploration noise in the optimizer. The system injects zero - mean Gaussian noise into the update step of the NorMuon optimizer, and the noise amplitude linearly anneals to zero as the training progresses. This is like giving the optimizer a behavior pattern of "explore boldly first and then converge stably", helping the final solution fall into a flatter loss basin.

A more streamlined fused MLP kernel. The system rewrites a Triton GPU kernel so that only the squared ReLU activation values are stored during forward propagation, and the unsquared intermediate results are recalculated inside the kernel during backpropagation, eliminating a complete read - write round - trip of the activation tensor in high - bandwidth memory — this is a direct speed - up at the hardware level.

The three improvements belong to three different professional fields: precision strategy, optimizer design, and GPU kernel programming. The fact that the system found room for improvement on the results of two - year community optimization speaks for itself.

Scenario 3: GPU kernel optimization (SOL - ExecBench)

The first two scenarios work at the model training level, while the third scenario goes deeper: Optimization of GPU computing kernels.

SOL - ExecBench is a benchmark test launched by NVIDIA, which includes 235 kernel writing tasks, covering various real - world workloads such as matrix multiplication, reduction, normalization layers, attention components, quantization routines, and fused blocks. The scoring standard is the SOL score: 0.5 corresponds to the benchmark PyTorch implementation, and 1.0 corresponds to the hardware theoretical limit. The previous best public score was 0.699.

Recursive's system runs on all 235 kernels, allowing the reuse of discovered optimization patterns (such as memory transfer strategies, blocking methods, and reduction techniques) across tasks. The final score was improved to 0.754, reducing the gap from the hardware limit by 18%.

This scenario is of special significance because kernel engineering is an extremely specialized field — engineers who can write efficient Triton/CUDA kernels are rare in the world. The Recursive team admitted in their blog that they themselves are not experts in the kernel field. "These ideas come from the system itself, not from our professional background."

Recursive: Use AI research to recursively improve AI

The company Recursive Superintelligence, which released this achievement, was founded between the end of 2025 and the beginning of 2026. It just ended its stealth mode last month. In addition to Tian Yuandong, the former director of research scientists at Meta FAIR, the founding members also include:

Richard Socher, CEO of Recursive, former chief scientist at Salesforce

Alexey Dosovitskiy, former research scientist at Google DeepMind and the first author of Vision Transformer, with over 160,000 Google Scholar citations

Tim Rocktäschel, former principal scientist at DeepMind and professor of artificial intelligence at UCL

Peter Norvig, former research director at Google, co - author of the famous AI textbook "Artificial Intelligence: A Modern Approach" with Stuart Russell

Caiming Xiong, former vice - president of AI at Salesforce

Tim Shi, former researcher at OpenAI, co - founder and CTO of the enterprise AI company Cresta

Josh Tobin, CTO of Recursive, former head of research at OpenAI and Uber ATG

Jeff Clune, former vice - president of research at Google DeepMind, professor of computer science at the University of British Columbia, Canada

Moreover, as soon as this startup made its debut, even without a public product, it had already secured $650 million in financing and was valued at $4.65 billion. The financing was led by GV (Google Ventures) and Greycroft, with NVIDIA and AMD Ventures following.

The company's core proposition directly corresponds to its name: Build an AI system that can recursively improve its own research capabilities, let AI participate in and accelerate the R & D process of AI itself, and ultimately form a continuously self - enhancing closed - loop.

For more details, refer to the report "After leaving Meta, Tian Yuandong just announced his entrepreneurship".

Of course, in the track, Recursive is not alone. Yann LeCun's AMI Labs completed a $1 billion financing in March this year, and David Silver's Ineffable Intelligence secured an $1.1 billion seed round in April. They all point in a similar direction: Let the AI system autonomously generate knowledge and reduce human intervention in the research process. However, in terms of the rhythm of public achievements, Recursive's "first step" is one of the most specific and reproducible technical demonstrations among similar companies at present.

The dawn of the recursive paradigm

Recursive's released achievement, in a broader