Rushing towards AGI, Claude crowned at the end of the year, stunning the whole network with nearly 5 hours of autonomous coding.
As 2025 draws to a close, it turns out that the real masters are hidden in the "folks"!
It's not Google, nor OpenAI, but Anthropic's king programming model, Claude Opus 4.5.
The latest report released by METR states that Claude Opus 4.5 can continuously code autonomously "for up to 5 hours without crashing".
Even OpenAI's most powerful programming model, GPT-5.1-Codex-Max, has to admit defeat.
Nowadays, the entire internet is shocked by the coding strength of Claude Opus 4.5.
The task duration that AI coding agents can handle is not only growing exponentially - the growth rate is also continuously increasing!
From 2019 to 2024: The task duration doubles every 7 months.
From 2024 to 2025: The task duration doubles every 4 months.
Many people will instinctively shake their heads when they see this curve for the first time.
Some people don't understand. Some people are reluctant to accept it.
But one fact is becoming clearer: The tasks that AI coding agents can complete continuously are moving from the "minute level" to the "hour level", and the acceleration is still increasing.
Netizens think this is the most important chart about AI:
Why is this chart called the "most important chart"?
Because it answers a key question:
Has AI hit a wall? Is AGI just another utopia? How much has AI really progressed in 2025?
It's normal that ordinary users don't feel much difference. For most people, the models have long been able to handle daily questions:
"Recommend a movie", "Explain this concept", "Write a piece of copywriting".
But the real changes are happening on another front: coding agents.
And this is exactly a form that most people (including journalists and policymakers) have difficulty accessing.
These progressions may seem small, but when accumulated, they are of great significance.
In April 2026, the first batch of AI agents will be able to independently complete a full human workday;
By the end of 2026, AI will be able to complete half a week's worth of tasks;
By the end of 2027, AI will be able to complete two months' worth of tasks;
By the end of 2028, AI will be able to complete several months' worth of human work;
By 2030, AI will be able to undertake most of the management work of small businesses or organizations.
Exponential Growth of AI
The Era of Agents Has Arrived
In order to quantitatively compare the capabilities of AI and humans, METR proposed a new indicator in March this year: the 50% task-completion time horizon.
In other words, think of AI as a new employee: give it a task and see how long it takes for a human on average to complete the task when the AI has a "50% probability of success".
GPT-5.1-Codex-Max can already complete software engineering tasks lasting up to 2 hours and 53 minutes (with a 50% success rate), which is four times more capable than o1.
And the 50% time horizon of Claude Opus 4.5 is about 4 hours and 49 minutes. This is the longest time horizon announced to date.
Although the 50% task-completion time horizon is relatively long, the 80% time horizon of the Opus 4.5 model is only 27 minutes, which is comparable to previous models and lower than the 32 minutes of the GPT-5.1-Codex-Max model.
However, the gap between the 50% and 80% time horizons of Opus 4.5 reflects that its logical success rate curve is flatter, which means that the Opus model has a differential advantage in long-duration tasks.
Some people even think that Claude Code is already close enough to the definition of general artificial intelligence.
This last statement may be an exaggeration - but it reflects a certain reality.
2025 can be regarded as the most chaotic year for AI discussions. The gap between the actual progress and the focus of public opinion has never been so large.
But next year may bring a change - when the influence of coding agents penetrates into every corner of the social economy, people will finally witness its power. Hopefully, by then, we will still have enough time to make full preparations.
AGI Is Approaching
Memory Is the Last Hurdle
It's not surprising that agents can handle tasks for longer and longer periods.
Previous studies generally point to four main reasons:
Stronger Reasoning: Capable of breaking down large tasks into smaller ones.
More Familiar with Tools: Can write code, search the web, and run scripts.
More Stable Self-Correction: Can roll back, retry, and continue to progress after making mistakes.
Non-Decreasing Returns: A slight improvement in accuracy can lead to a huge increase in the task duration that can be handled.
For example, the new generation of models can better plan subtasks, call external tools (such as code writing and web browsing), and correct themselves when making mistakes, thus maintaining a high success rate in task chains lasting for several hours.
Of course, while imagining the bright future, we also need to see the current limitations.
But when the task duration moves from "hours" to "workdays", new problems will emerge:
Loss of Context: Forgetting what was said earlier as the task progresses.
Accumulation of Deviations: Small errors turning into big disasters.
Drift of Goals: Losing focus and going off-topic during the task.
Ultimately, they all point to the same core: long-term memory.
Memory: The Last Problem on the Road to AGI
Almost all the ability shortcomings of AI will ultimately involve memory.
You can think of the current large models as: a very smart and quick-reacting new employee who "forgets everything after work".
It can write code, reason, and write articles. But once the conversation ends, it hardly remembers what it has done.
Currently, the "memory" of many agents mainly relies on two methods:
Powerful Retrieval Tools: Searching when needed (like using grep in a code library).
Summary and Compression into Context: Compressing past content into a few paragraphs and feeding it back.
Although there has been significant progress in information retrieval technology, even the current best RAG (Retrieval-Augmented Generation) system only has an accuracy rate of about 90%.
The continuously expanding context window is indeed improving this problem: a larger window means more data can be input into the model at the same time, thus enabling the model to "read" more effectively in the large memory index.
But even so, to reach the "meticulous" memory level of AGI, a breakthrough in the underlying architecture is still needed.
Moreover, the bigger problem is: no system has truly achieved "self-learning".
Without long-term memory, AI cannot become "smarter with use" like humans, cannot learn from mistakes, and can hardly accumulate "common sense" and "wisdom".
Just "remembering" is not enough. Agents must be able to actively "learn" from experience.
Different from agents, the human brain is good at converting short-term experiences into long-term memories, forming knowledge networks and lessons learned over time.
If AGI wants to reach the breadth and depth of human intelligence, it also needs such a memory system.
The industry generally believes that memory is the last but most crucial piece of the puzzle for general intelligence.
In other words, the existing "computing power" and "intelligence" of AI may already be approaching what AGI requires. The only thing lacking is a long-lasting and rich memory like that of humans.
Whoever can crack the "memory problem" first will gain a decisive advantage in the AGI race.
Breakthroughs Next Year
Long-Term Passive Memory
Current agents