Recursive self-evolution of AI: The outline of that thing is becoming clearer and clearer.
God Translation Bureau is a compilation team under 36Kr, focusing on fields such as technology, business, workplace, and life, and mainly introducing new technologies, new ideas, and new trends from abroad.
Editor's note: Stop talking about "human-machine collaboration." We have entered the era of "managing AI." When software factories achieve zero human-written code, the recursive self-evolution of AI is making the exponential curve steeper. This article is from a compilation.
In October 2023, I wrote an article about "the shadow outline of that thing," speculating on the "forms" that AI might evolve into in the next few years. I think we can now see the true appearance of this thing more clearly, as well as some of the consequences it brings. As I discussed in my recent blog post, we have entered a new stage of AI. After the emergence of ChatGPT, the collaboration between humans and AI has taken the form of what I call "collaborative intelligence," where humans get help with tasks through repeated interactions with prompts. And starting from the end of 2025, thanks to the emergence of AI agents such as Claude Code, OpenAI's Codex, and OpenClaw, we have entered a new era. You just need to hand your work over to these AI systems - even if it would take humans hours to complete - and they can return reasonable and useful results within minutes. This is an era of "managing AI" rather than just "collaborating with AI."
This new way of applying AI is the result of the exponential growth of AI capabilities. This means that if you don't understand the continuously increasing capabilities of AI, you can't understand our current situation or predict our future.
Riding the Wave of Exponential Growth
Exponential growth is difficult to visualize intuitively, so instead of starting with a chart, I'd like to start with "otters." If you've been following my articles about AI, you must know my "otter test": I challenged various AI image models to generate a picture of "an otter using WiFi on an airplane." As shown below, from 2022 (the year ChatGPT was released) to 2025, this process was extremely rapid and amazing.
So, what has happened since that picture in April 2025? As image generation approaches perfection, video has become the new frontier and has also witnessed exponential progress. To demonstrate this, I input the following prompt into the most advanced (and not yet released in the United States) AI video model developed by ByteDance (the parent company of TikTok): A documentary about how otters view Ethan Mollick's "otter test," which evaluates AI by its ability to generate images of otters sitting on an airplane.
Except for a mispronunciation, the video was almost perfect, and even the otters were given anthropomorphic expressions. Of course, video models are cool, but they don't necessarily represent the height that practical "agent AI" can reach. So, if we look at the benchmark tests of AI capabilities, can we also see the same exponential curve?
In the most famous evaluation system in today's AI field - the METR long-task graph, we do see this trend. This test tries to measure the progress of AI by observing how much human workload an AI can complete autonomously with a certain degree of reliability. Although it has sparked some criticism, and even the METR official has pointed out potential problems, even if you don't like the METR chart, you'll find that most charts measuring AI capabilities show the same curve.
As an example, I selected four AI tests of different difficulties and diversity and plotted the changes in the test results in the following figure. In the upper left corner is the score of the "Google-Proof" Q&A benchmark test, which is a knowledge-based test. Even graduate students using Google can only score 34% in non-professional fields and about 70% in professional fields, while the current top AI has scored 94%. Looking at GDPval, industry experts compare the performance of AI and senior humans in complex tasks, and the latest AI has reached or exceeded the level of top humans in 82% of the cases. The same pattern also appears in the "Humanity's Last Exam," which is a set of difficult questions written by university professors and requires a high level of professional knowledge to answer. We can even use the ability of AI to solve puzzles as a reference. Each one shows a rapid increase in capabilities, with little sign of slowing down, at least before they reach the full score ceiling of the test.
Putting aside the exponential charts, it's important to realize that all these tests have their limitations, and the performance of AI still shows a "jagged frontier," that is, it performs extremely well on some tasks but messes up on others. In addition, despite the amazing performance in the tests, enterprises are still in the very early stage of adopting AI, which means that so far, the situation of most organizations has hardly changed significantly. But "most organizations" doesn't mean all organizations. We have begun to see that new methods of organizational management using the new capabilities of AI agents are emerging.
Radical Changes in the Way of Working
A few weeks ago, a three-person team from StrongDM, a security software company focusing on access control, announced that they had built a "software factory" - a model of collaborating with AI agents, completely relying on AI to write, test, and release software in the production environment without human intervention. This process includes two rather radical rules: "Code shall not be written by humans" and "Code shall not be reviewed by humans." To support the operation of this factory, the expenditure of each human engineer on AI tokens is expected to be equivalent to their salary, at least $1000 per day.
The core concept of this "factory" is to transform the future product roadmap written by humans into actual products. Coding agents develop software according to the roadmap, while testing agents test the software in a simulated customer environment (the simulated environment is built by the testing agents on demand). Each group of agents provides feedback to each other and repeats the cycle until the result satisfies the AI. Then, humans review the finished product and deliver the result to the customer, without anyone touching or even seeing the underlying code during this period.
A twin version of Slack, which is a simulated version of Slack built by the testing agents of the software factory. A group of simulated customers submit requests to test the tools being developed by the coding agents.
Obviously, there are many details behind the success of this model, and the StrongDM team has publicly shared most of them. They also invited some sharp external observers to watch the factory in operation and comment, so you can learn more about the advantages and disadvantages of this method through the accounts of Simon Willison and Dan Shapiro. However, in many ways, the specific details of the software factory are not as important as the fact that such a radical experiment on the way of working is not only possible but also likely to be inevitable. AI has become powerful enough to change the way organizations operate, and as the models continue to improve, such experiments are just beginning.
Rolling Disruption
Practical agents, uneven exponential growth, and the ability to conduct radical experiments on the nature of work together form a rolling and unpredictable AI evolution environment. As AI capabilities break through various thresholds, it will unlock brand-new application scenarios and sometimes even change people's perception of the boundaries of AI overnight. At the same time, organizations conducting AI experiments will explore the most suitable way of operation for themselves, which will lead to sudden announcements about major adjustments in new strategies or talent valuation preferences. In addition, as AI continues to progress, more decision-makers will be interested in AI governance, which will lead to conflicts with AI companies.
This is not speculation because we witnessed all this in just one week. On February 22, an unknown financial company, Citrini Research, released a fictional scenario describing how the application of AI might destroy a number of established enterprises before 2028. Although there were many obviously far-fetched elements in the article, it touched the nerves of Wall Street, causing violent fluctuations in stock prices. On February 26, the financial service company Block announced a 40% layoff and hinted that it was related to AI. In fact, the role of AI in this was probably greatly exaggerated, and it was just used as a scapegoat for the large-scale layoff. Finally, to end the week, on February 27, there was a public conflict between the Pentagon and the AI company Anthropic, with the focus of the debate being: Who has the right to set the rules for the government's use of Claude?
In many ways, the appearance of these cases does not completely match the facts. Citrini's report is a fictional scenario, Block's layoff is not really caused by AI, and the conflict over AI in war involves a series of complex issues that are not yet fully clear. But I think that short week is a good indication of what the near future will be like: the sudden exposure of AI capabilities leads to a chain reaction in the market; AI has an increasingly real impact on employment (even though the pros and cons in the short term are still controversial); and the entanglement between AI companies and global policy-making is becoming closer. As the stakes increase, the situation may become more volatile.
Of course, the situation may also calm down. Maybe the progress of AI will hit a ceiling, organizations will gradually digest these changes, and as people understand what AI can and cannot do, the rolling disruption will become more controllable. There are many technologies in history that were once thought to change everything overnight but actually took decades to completely reshape the economy.
But I'm not optimistic about this possibility of a smooth situation.
One reason is that AI companies are quite clearly telling us what the next step is: Recursive Self-Improvement (RSI). The core concept is that AI systems are increasingly being used to build better AI systems, thus forming a feedback loop, which may further accelerate the curves I showed before. At the Davos Forum in January this year, Dario Amodei of Anthropic explained that if you create models that are good at programming and AI research, you can use them to build the next generation of models, thus speeding up the iteration cycle. He pointed out that the engineers inside Anthropic hardly write code by themselves now. When OpenAI released its latest Codex model in February, the company stated that it was "our first model that played a key role in its own creation." And Demis Hassabis of Google DeepMind also admitted in the same Davos panel discussion that closing the loop of self-improvement is a goal that all major laboratories are actively working on, although he also warned that there are still some lacking capabilities and real risks.
We don't know where this will lead. RSI as a theoretical concept has a history of decades. Whether it's computing power, data, or the difficulty of AI research itself, major laboratories may encounter bottlenecks. We also don't know whether AI based on large language models will eventually reach an insurmountable peak, or whether that "unevenness" can ever be eliminated. I think there is no definite conclusion at present, but I also think that recursive self-improvement is no longer science fiction but a clear project on the roadmap of every mainstream AI company. If this loop is really closed, the exponential curve we observe will become steeper, and its end will be even more unpredictable.
This is our current situation: The turmoil in that week in February was a preview of the real feeling when the continuously increasing capabilities of AI began to impact the market, employment, and the government at the same time. This sense of uncertainty is likely to spread further. But uncertainty does not mean powerlessness. When a technology is so powerful and not yet finalized, the choices of individuals and organizations at present are particularly important. We can now see the outline of this "thing," but we can still influence this "thing" itself and what it means to all of us. Obviously, there are neither ready-made rules nor examples to follow for how to use AI in work, school, or government. This is a problem, but it also means that every organization exploring the best practices of AI is setting a precedent for others. The window period for shaping this "thing" may not be long, but at least it's not closed yet.
Translator: boxi.