AGI Won't Arrive Soon: AI Can't Fully Replace White-Collar Workers Without Continuous Learning Ability

Continuous learning is a huge bottleneck.

God Translation Bureau is a translation and compilation team under 36Kr, focusing on fields such as technology, business, the workplace, and life, with a particular emphasis on introducing new technologies, new ideas, and new trends from abroad.

Editor's note: A senior AI practitioner, based on over a hundred hours of development experience, reveals the biggest bottleneck in the implementation of AGI - the lack of the kind of continuous learning ability that humans possess in models. Through in - depth analysis of the paradox of "learning to play the saxophone" and the data ceiling, the author predicts that short - term changes will be limited, but the probability of an intelligence explosion within a decade will increase sharply, injecting a dose of sober thinking into the enthusiastic discussions. This article is from a compilation.

"Things always happen more slowly than you expect, and then more quickly than you can imagine." —— Rudiger Dornbusch

On my podcast, the debate about the timeline for the realization of Artificial General Intelligence (AGI) has never stopped.

Some guests believe it will take another 20 years, while others firmly believe it will only take 2 years.

Here are my thoughts in June 2025.

The Dilemma of Continuous Learning

People often say that even if the development of AI completely stalls, the transformative power of existing systems on the economy will still far exceed that of the Internet.

I have doubts about this.

Today's Large Language Models (LLMs) are indeed amazing, but the fact that Fortune 500 companies have not used them to transform their workflows is not because of the conservatism of management. The fundamental problem is that it is difficult to achieve human - level routine labor output. This involves the lack of underlying capabilities in the models.

As someone who considers myself an "AI vanguard," I spent over a hundred hours building small LLM tools. The actual experience has instead extended my expected timeline. I tried to make the LLM rewrite machine transcripts to improve readability like a human, extract Twitter materials from interviews, or collaborate on writing articles paragraph by paragraph. These short - cycle, closed - loop tasks of language input and output should have been within the core capabilities of the LLM. However, its actual performance was only barely passing (5/10) — of course, this was still quite impressive.

The core problem is that the LLM cannot progress continuously like a human. The lack of continuous learning ability is a fatal flaw. Although the baseline level of the large model for most tasks has exceeded that of an average person, it is impossible to get high - level feedback from the model. The out - of - the - box capabilities of the model are the ceiling. Even if you repeatedly adjust the system prompts, the actual effect is far inferior to the growth and experience accumulated by human employees.

The core of human value is not raw intelligence, but the ability to build context, reflect on mistakes, and continuously optimize details in practice.

Imagine teaching a child to play the saxophone: let her try to play → identify the sound → make adjustments. If we change it to this model: when the first student makes a mistake on the first try, you immediately interrupt and write a detailed error analysis; the next student reads the notes and directly challenges a Charlie Parker piece; after failing, optimize the notes and teach the third student.

This is definitely not going to work. No matter how sophisticated the prompts are, it is impossible for someone to learn to play the saxophone just by reading written instructions — and this is the only way we "teach" the LLM.

Yes, there is something like Reinforcement Learning fine - tuning (RL fine - tuning), but the LLM lacks the active adaptability of human learning. My editors are excellent because they can independently discover details in their work: think about the audience's preferences, understand my points of excitement, and optimize daily processes. If a customized reinforcement learning environment is required for each subtask, it is impossible to achieve this kind of growth.

Perhaps in the future, there will be smarter models that can build their own reinforcement learning closed - loops: I give high - level feedback, and the model automatically generates verifiable training questions and even builds a practice environment to make up for its shortcomings. However, the implementation of this is extremely difficult, and the generalizability of the technology is questionable. Although one day the model will be able to learn naturally in the workplace like a human, in the next few years, I don't see a clear path to embedding online continuous learning into the existing LLM architecture.

The LLM can indeed show flashes of inspiration in a single conversation. For example, when collaborating on writing an article, the first four paragraphs of suggestions were very poor. When I rewrote them myself and bluntly said, "What you wrote is terrible. Look at my version," its subsequent suggestions became better. However, this subtle understanding of preferences resets to zero immediately after the conversation ends.

A seemingly viable solution might be a long - context window (such as Claude Code compressing memories into summaries every 30 minutes). However, outside of software engineering, condensing rich experience into a text summary is bound to be fragile — just imagine summarizing how to teach someone to play the saxophone in words. Even Claude Code often discards hard - won optimization solutions after compression because the summary fails to retain the key decision - making logic.

Therefore, I oppose the assertions of podcast guests Sholto and Trenton (quoted from Trenton):

Even if AI stops developing and lacks general intelligence, its economic value will still be huge. The data of white - collar jobs is very easy to collect, and full automation will surely be achieved in the next five years.

If the development of AI stalls today, I think less than 25% of white - collar jobs will be replaced. Although many tasks can be automated (for example, Claude 4 Opus can indeed rewrite transcripts), because the model cannot continuously learn and adapt to my preferences, I will still choose to hire humans. Without a breakthrough in continuous learning, collecting more data will hardly change the status quo: AI may barely handle subtasks, but the lack of context building means that AI will never be able to become a real "employee."

This makes me pessimistic about transformative AI in the near future, but extremely optimistic about the situation after a decade. Once the continuous learning bottleneck is broken, the value of the model will experience a significant leap. Even without a pure "software singularity" (where the model independently iterates to produce smarter descendants), there may still be an intelligence revolution with wide - scale deployment: AI will penetrate all economic sectors and learn in the workplace like humans. What's even scarier is that they can integrate the learning results of all copies, which is equivalent to a single AI learning all jobs globally simultaneously. An AI with online learning ability can quickly transform into super - intelligence without algorithmic breakthroughs.

However, I don't expect to turn on an OpenAI live stream one day and hear the news that "continuous learning has been completely overcome." Labs have the motivation to quickly release innovations. Humans will first encounter incomplete early versions (or training during the testing period) and then obtain truly human - like learning abilities. Before this huge bottleneck is broken, we have plenty of time to prepare.

Computer Operation Ability

When I interviewed Anthropic researchers Sholto Douglas and Trenton Brickton, they predicted that a reliable computer - operating intelligent agent would appear by the end of next year.

There are already computer - operating intelligent agents now, but their performance is poor. What they envision is something completely different. By the end of next year, all you have to do is tell the AI, "Go and file my taxes for me," and it will automatically complete the task — rummage through your emails, Amazon orders, and Slack records, chase invoices from suppliers, organize all receipts, distinguish business expenses, seek your confirmation for ambiguous areas, and finally submit Form 1040 to the IRS.

I have doubts about this. Although I'm not an AI researcher and don't dare to comment on technical details, based on my current knowledge, the reasons for my doubts are as follows:

As the task duration increases, the execution chain will inevitably lengthen. The AI needs to perform two hours of computer operations to verify the correctness of the results, not to mention the additional computing power consumption for processing images and videos. A slowdown in progress is almost inevitable.
There is an inherent deficiency in the multimodal data for computer operations. I really like Mechanize's incisive discussion on automated software engineering: "The model expansion in the past decade has benefited from a large amount of free Internet text data, but this only solves the problem of natural language processing. Want to train a reliable operating intelligent agent? Imagine training GPT - 4 with text data from 1980 — even if you have the computing power, it's useless."

Perhaps pure text training has enabled the model to understand UI logic? Maybe Reinforcement Learning fine - tuning (RL fine - tuning) can break through the data limitations? But I haven't seen any evidence that the model's data hunger has been alleviated, especially in areas where they are not proficient.

Another possibility is that the model, as a master of front - end coding, can create millions of simulated UIs for practice? My view is as follows:

The reinforcement learning process described in DeepSeek's R1 paper seems simple, but it took two years from the release of GPT - 4 to the emergence of o1. Of course, it's absurd to say that the R1/o1 research and development is simple — there is a huge amount of engineering debugging and solution screening behind it. But this just confirms my point: even a "simple" idea like "training a model to solve verifiable mathematical coding problems" takes so long. Facing the computer operation problems with scarcer data and completely different modalities, we have obviously underestimated the difficulty of breaking the deadlock.

Reasoning

Don't be too quick to pour cold water. I don't want to be like those spoiled kids on Hackernews who, even if they get a goose that lays golden eggs, will only complain that the goose cackles too loudly.

Have you read the thought processes of o3 or Gemini 2.5? They are indeed reasoning! Break down the problem → figure out the user's needs → examine the internal monologue → adjust immediately if the direction is wrong. And we've become so used to it that we think, "Of course, machines can think, deduce, and give smart answers. Isn't that what machines are for?"

Some people are overly pessimistic because they haven't seen the performance of top - notch models in their areas of expertise. Give Claude Code a vague requirement, and ten minutes later, it can generate a usable program from scratch. This kind of shocking experience makes people can't help but wonder, "Did it really do it?" You can talk about circuit diagrams, training distributions, or reinforcement learning, but the most straightforward explanation is that a baby - level general intelligence has awakened. At this moment, there must be a voice in your heart saying, "Yes, we've really created an intelligent machine."

My Predictions

The probability distribution is very wide (this is exactly why I firmly believe in probability theory). It is completely reasonable to prepare for a misaligned Super - Intelligence (ASI) in 2028 — this result is by no means a fantasy.

Here are the time points where I'm willing to bet 50 - 50:

2028

The AI will be able to handle the tax affairs of my small business within a week, just like a competent general manager: rummage through various websites to find receipts, complete missing documents, chase invoices via email, fill out forms, and submit them to the IRS in a full - fledged process.

Currently, the computer operation ability is at the GPT - 2 stage: there is a lack of pre - trained data, and the model needs to optimize sparse rewards in a long - cycle with unfamiliar operation instructions. However, the base model is smart enough and may have the potential for computer operation. Coupled with the current surge in global computing power and the number of researchers, it may catch up. The significance of small - business tax filing for computer operation is comparable to that of GPT - 4 for language models. And it took exactly four years from GPT - 2 to GPT - 4.

(Note: There will definitely be amazing demos between 2026 and 2027, just as flashy but impractical as GPT - 3 back then. However, they will be unable to handle complex end - to - end projects involving computer operation that last for a week.)

2032

The AI's on - the - job learning ability will be comparable to that of a human white - collar worker. If you hire an AI video editor, after six months, it will be able to deeply understand my preferences, the tone of the channel, and the audience's likes and other actionable knowledge, just like a human colleague.

Although it is difficult to embed a continuous learning module into existing models, seven years is a long time! GPT - 1 was just launched seven years ago. It is not a pipe dream to find a solution for on - the - job learning of models in the next seven years.

You may ask, "You just emphasized that continuous learning is a fatal shortcoming, and now you're predicting an intelligence explosion and wide - scale popularization in seven years?" Yes, I do foresee a world with drastic changes in the short term.

AGI Timeline Prediction

The success or failure lies within this decade. (Strictly speaking, the marginal probability decreases year by year, but "either make it or break it" is more catchy). In the past decade, the progress of AI has relied on a four - fold annual increase in the training computing power of cutting - edge systems. Whether it's chips, electricity, or the proportion of GDP consumed by training, this kind of growth will definitely stop after this decade. After 2030, AI progress can only rely on algorithmic breakthroughs, and the low - hanging fruits of the deep - learning paradigm will eventually be picked. The annual probability of AGI emerging will drop sharply.

This means that if the reality follows the longer timeline in my prediction, the world will remain largely the same in the 2030s or even the 2040s. But as long as it deviates from this path, even if we are clearly aware of the current limitations of AI, we should be prepared to embrace a truly crazy era.

Translator: boxi.