Four AI Scientists Debate: How to Respond When AI Achieves Self

The runaway risk of AI self-evolution and human-AI symbiosis.

Approximately a week ago, Anthropic, which is preparing for an IPO, updated an article on its official blog. The title of the article is "When AI Builds Itself".

On the day this article was published, the issue of AI security was once again dragged back to the center of the public opinion storm.

In this article, Anthropic discussed an issue called "AI self - evolution" and pointed out that "AI has already been able to participate in the work of building more powerful models for itself, which is much faster than we expected."

Actually, AI self - evolution is not a new technology. Even, since the first day of the emergence of AI technology, people have been thinking about how to let AI participate in the process of self - evolution.

Just like what people are imagining in the field of embodied intelligence now, using humanoid robots to build humanoid robots.

In fact, while AI scientists are afraid of AI having the ability of self - evolution, they are also researching and even utilizing such self - evolution ability.

Tian Yuandong (former research director of Meta's FAIR team), who received wide attention during Meta's layoff wave, officially announced his entrepreneurship at the beginning of this year. The name of his startup is Recursive Superintelligence (RSI), with the goal directly targeting AI self - evolution.

It is precisely such a company that recently completed a $650 million financing round, with a valuation reaching $4.65 billion (about 31.5 billion yuan), becoming another Silicon Valley AI star team pursued by a host of giants.

So, what exactly is AI self - evolution? Will self - evolution lead to AI losing control? How should humans coexist with AI?

The ongoing AI self - evolution is also a major topic at this year's Beijing Academy of Artificial Intelligence Conference. At this year's conference, we saw the thoughts and predictions of four young AI scientists on this topic.

Perhaps, from their perspectives, we can catch a glimpse of the future direction of AI self - evolution and find some inspiration for dealing with AI anxiety.

The AI scientists invited by the Beijing Academy of Artificial Intelligence Conference to discuss this issue are:

Lin Tao, a specially - appointed researcher in the Department of Artificial Intelligence, School of Engineering, Westlake University;

Gu Yu, co - founder of NeoCognition;

Wang Yan, a former expert researcher in Tencent's Hunyuan Frontier;

Yang Mengyue, a doctor from University College London and an assistant professor at the University of Bristol.

The following is a summary and organization of the dialogue content of the four guests without changing the original meaning:

01 What is AI self - evolution?

Question: Many current AI systems can reflect and modify prompts, which seems to have a flavor of self - improvement. If we want to define it more strictly, what is AI self - evolution?

Lin Tao: I think self - evolution should be a multi - level evolution. It can be the evolution of the external brain or the internal brain.

Most importantly, AI should be able to recognize its own limitations and evolve both its external and internal brains at the same time, or internalize more external abilities during the evolution of the external brain to further achieve the evolution of the internal brain.

Gu Yu: I think the most important aspects of RSI (recursive self - improvement) are two dimensions: Proactiveness and Learning.

Learning is about how to enable AI to have reliable continuous learning and online learning algorithms. Another issue is self - evolution. Agents need to know where they need to evolve.

So self - evolution needs to solve two problems respectively:

One is the Metacognition at the "what" level (meta - cognition). You need to know what you lack, what you need, and how to choose;

The other is the "how" level, that is, how to specifically implement the learning algorithm.

Wang Yan: At least at this point in time, compared with traditional SFT and RL, if a system can rely less on humans, it has actually achieved self - evolution.

Yang Mengyue: What we are talking about as RSI now is actually a step further from self - improvement. It's not just about the enhancement of abilities, but also about whether the "ability to evolve" itself can become stronger.

An important issue is that the research direction of Jeff Clune and Tim Rocktaschel, two members of the founding team of Recursive company (Recursive Superintelligence), is Open - endedness.

So, what is Open - endedness?

In an open world, does an agent have the ability to ask questions on its own? Can it discover where its knowledge boundaries, system boundaries, and memory boundaries are? It needs to break through its own boundaries to ask questions.

To achieve self - evolution without human intervention, including the evolution of the ability to evolve, its ability to ask questions is very important.

Question: At this point in time, what is the most valuable and most likely to mature part of AI self - evolution?

Wang Yan: I wonder if you have noticed that the iteration of models has accelerated since January 2025.

Actually, it's because people in the base model field who are most familiar with the upper limits of AI capabilities have stopped writing code. This is a fact that has occurred in base model training.

Moreover, it can be clearly felt that the iteration speed of base models is accelerating, including Claude, GPT, and domestic base models. You can't say it's completely self - evolution, but there is indeed AI iterating AI.

As for which field will mature first, the field of base model training has left the deepest impression on me. Although someone beside it specifies the direction, in essence, the base model is already self - evolving.

Question: If we don't change the model parameters but only evolve some other components, can the base model achieve a strong enough leap in capabilities?

Wang Yan: Definitely.

Actually, just changing the prompt can achieve better results.

For example, sometimes I wonder why interns can't do the work I assign to them. After looking at their prompts, I find that their prompts are not well - written.

As long as I rewrite a more effective prompt and make the rules clearer, I can achieve better results.

Since I can do this, silicon - based beings at a higher level than me can do it better, even without changing the model parameters.

Question: What does Teacher Lin think?

Lin Tao: This should be an iterative process. We need better harness (control engineering), that is, an external brain, to bring out the upper limits of the current model;

As more and more people have their own harness, these programs may be used to train stronger base models;

On the basis of stronger base models, we will develop stronger harness and better external brains. This is also an iterative process.

Question: Then which area do you think will mature first with comprehensive resources?

Lin Tao: I think developing harness is the easiest.

Gu Yu: I prefer to view harness and skill from a unified perspective.

From a unified perspective, they are all long - term memories, just from different angles.

For example, harness is a kind of long - term memory at the Meta - level (meta - cognition), skill is more of a long - term memory of workflow or process knowledge, and model parameters are more likely to be long - term memories of intuition.

If you ask me which one to prioritize, it's hard to say from an academic research perspective. They are all important and complementary to each other.

From a company's perspective, there are many practical factors. The easier starting point is harness. With harness, you can have your product. With a product, you can acquire users. With users, you will have data and form a closed - loop. This is a non - technical view.

Yang Mengyue: I myself am more concerned about the evolution at the memory level because my research direction is how to understand rules and causality.

Now people can feel that the capabilities of models are getting stronger and are starting to cover the capabilities of harness, gradually eroding harness and reaching the upper limit.

So it's hard to say about future development. Maybe the base models will become stronger and stronger, and the improvement in the harness direction may be minimal.

02 At which stage does AI self - evolve first?

Question: When is the most appropriate time for AI self - evolution to occur?

Gu Yu: Let me add one thing about harness. Although harness may be eroded by the progress of models, it still depends on the aspect. I think there are still some modules that are necessary.

For example, modules that ensure the safety and verifiability of models are parts that probabilistic models can never replace.

Regarding the timing of self - evolution, I think it can be understood as Learning + Long - Term Memory (LTM).

For humans, every reasoning and every problem - solving is a learning opportunity. Humans don't collect a bunch of problems and then conduct static learning based on these problems.

If we believe that human learning is an efficient way, I think it's the same for intelligent agents.

You would hope that agents don't waste every reasoning opportunity because every reasoning has the opportunity to get a learning signal. This is consistent with the macro - philosophy of reinforcement learning, but the current mainstream deep learning is still in the stage of model parameter update, and it's difficult to achieve the setting of online learning.

So to truly achieve this, some new learning algorithms are needed, such as non - parametric updates.

Question: Is there a difference between System 1 and System 2 here?

Gu Yu: Indeed.

For example, if we regard non - parametric things as System 2 because it is more explicit and slower, but it also retains the possibility of being transformed into System 1, including generating more data based on the learned non - parametric rules, just like the transformation from the external brain to the internal brain as Teacher Lin mentioned.

Wang Yan: I have also done a lot of work on TTT, that is, Test - Time Training. I'm also very concerned about this series of work.

I think that when a model predicts the next token, the important thing is to learn the update gradient of each token.

In the future, we will definitely find a training algorithm that allows the model to learn how to update the gradient of each token. This is the real end - to - end thinking.

Lin Tao: From the perspective of model training, it can first be influenced by harness and then affect post - training. By improving model performance through post - training, a stronger model can be obtained. The stronger model can then feed back to the pre - training stage to improve the capabilities of the base model, thus forming a closed - loop.

So it is evolving all the time, just in different scales and ways.

Yang Mengyue: I also think that self - evolution is happening all the time and extends to all aspects.

For example, how to generate a trajectory.

If we let GPT generate an answer to a question, it is actually reasoning. The reasoning process is a process of creation and combination, and the process of creation and combination is to ask questions to the environment and humans. So there is an evolution of mechanism design in the forward design itself.

In addition, when I get a reward, such as human feedback to the model, how to update the trajectory after getting the feedback will also gradually improve the whole process.

Question: Is designing one's own Benchmark also a sign of AI self - evolution?

Yang Mengyue: Can we now have a growing Benchmark, or even a growing and self - evolving world model?

Many current Benchmarks are fixed, using a fixed database for testing. In this way, no matter what, we can always find a model that can be well - trained based on the fixed database.

To reach AGI, we really need dynamic evaluation to adapt to its current capabilities and conduct a gradually growing evaluation of it.

Wang Yan: When we first started doing generation, there was no Benchmark. At that time, it was evaluated by humans.

I'm not sure if this can be evaluated by Benchmark because it's definitely impossible to evaluate it with a static Benchmark.

I'm not sure if a dynamic Benchmark can evaluate it either because both are self - evolving agents. I'm not sure if it will eventually return to human evaluation.

But from this perspective, it may not be possible to evaluate it with a Benchmark at all.

Question: Is it difficult to design an automated evaluation method?

Wang Yan: Yes.

Now there are many

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Four AI Scientists Discuss: How Should We Respond When AI Learns to Self-Evolve?

01 What is AI self - evolution?

02 At which stage does AI self - evolve first?