HomeArticle

Agent's Bitter Awakening: Intelligence is Shifting from Language to Experience

全天候科技2026-03-02 08:21
Go into practice.

In 2019, Richard Sutton, the "father of AI reinforcement learning," wrote a six - page article that influenced the entire subsequent artificial intelligence academic community.

The core of this article titled "The Bitter Lesson" is just one sentence:

For decades, humans have been stuffing domain knowledge into AI, but every time, they've been defeated by "letting the machine learn from its own mistakes."

This is true for chess, Go, speech recognition, and computer vision. All those carefully designed prior knowledge, artificial features, and expert rules have ultimately been trampled underfoot by large - scale computing and self - play.

Sutton is a recognized founder in the field of reinforcement learning. He spent half his life researching one thing:

Intelligence is not designed; it is forced out by the environment. The continuous interaction between the intelligent agent and the environment is the only reliable path to a higher intelligence ceiling.

After the article was published, the academic community's reaction was almost polarized. A group of researchers who had worked on feature engineering and expert systems for many years were forced to re - evaluate the long - term significance of their work. The controversy has not subsided to this day, yet Sutton's judgment has been repeatedly verified in the following seven years.

Looking back at these seven years from the perspective of the Agent wave at the beginning of 2026, this judgment is coming true in the AI industry in an unexpected way - it's just that most people haven't realized it yet.

I

Discussing Agents: Only Half the Story

From the Skill craze led by Claude, to Cowork collaboration, and now the widespread "shrimp - farming craze," Agent has become the hottest term in the AI world today.

But despite the hype, when facing Agents with increasing capabilities, the industry's highly focused questions are no longer just about "what can it do," but rather about how extensive its application scope might be as permissions are continuously opened and the plug - in ecosystem becomes increasingly rich, and how it will reshape production relations and drive changes in the economic structure.

From every product launch, every product review, to every industry tweet, the core question people are asking is becoming: What kind of huge business reshuffle will Agents with stronger autonomy and system permissions bring at the application level, and which industries and segments will be destroyed by the Agent wave?

There have even been more pointed warnings and predictions in the industry: As the scope and depth of Agent substitution continue to expand, risks such as large - scale job losses, increased income disparity, and shrinking effective demand are accumulating, which may trigger structural employment problems and a chain reaction of economic risks.

These narratives are valuable as they all ask the same type of question: As a tool, where will the reshaping of human society by Agents at the application level lead?

But few people seem to be asking another question:

What qualitative changes will the accelerated large - scale popularization of Agents bring to AI itself at the model level?

This question is the truly important one from Sutton's perspective.

II

The Dead - End of Chatbots

Before understanding the deep - seated value of Agents, it's necessary to see clearly what dead - end their predecessor, the Chatbot, has entered.

At the beginning of 2023, the number of ChatGPT users exceeded 100 million, setting the fastest - growing record in the history of consumer internet. Product managers around the world woke up from their slumber and frantically stuffed dialog boxes into their products. Customer service robots, knowledge Q&A, writing assistants, code completion - everything became a "chat interface."

But by the end of 2024, an embarrassing fact emerged: After the initial novelty wore off, the usage frequency of many users dropped significantly. Multiple media outlets and analysis agencies reported the trend of slowing growth in ChatGPT user activity.

Users found that they didn't know what to do with the dialog box. They might occasionally use it to write an email, rephrase something, or ask a question, but they never formed a stable usage habit.

The reason is simple: The interaction mode of Chatbots is one - question - one - answer, while the real human work process is multi - step, multi - tool, and multi - judgment.

If you ask a Chatbot to do market research, it will give you a seemingly good article. But you don't know if the data source is reliable, if it has missed key competitors, or if the reasoning behind the conclusion is sound. Although you get a result, you lose the whole process.

What's even more fatal is that each conversation with a Chatbot is isolated. It doesn't remember last week's preferences, doesn't know the context of the project, and doesn't understand the business logic of the organization. Every time you open the dialog box, you're re - introducing yourself to a polite amnesiac.

This is why, since the second half of 2024, the entire industry has collectively shifted to Agents because the ceiling of Chatbots is clearly visible.

But there's a dimension that almost everyone has overlooked: The ceiling of Chatbots is not only the ceiling of the product form but also the ceiling of model evolution.

III

Practical Interaction Is the Key

The core logic of Sutton's reinforcement learning philosophy is very clear: The ceiling of static data is the boundary of the known world.

No matter how large the corpus is or how high the number of parameters is, the ability boundary of a model trained on a fixed dataset is the boundary of the world depicted by that batch of data.

By 2024 - 2025, this boundary was clearly visible.

The Epoch AI team published a widely - cited analysis predicting that at the current consumption rate, high - quality internet text data will be basically exhausted in the next few years. The entire industry began to talk about the "data wall," a wall built by the physical limit of the total amount of information.

The answer provided by Chatbots is: from user conversations. But the information density of user conversations with Chatbots is extremely low.

Interactions like "Help me make this email more formal," "Write a quick sort in Python," and "What's the GDP of China" are just shallow mappings of human needs.

What the model can learn from these conversations is essentially no different from what it can learn by scraping a new batch of text from the internet. They are all statistical laws of language patterns and lack one thing: causal structure.

The difference with Agents is that in the process of completing tasks, they generate something that static corpora can never provide: decision - making trajectories labeled with causal structures.

For example, it includes: what the goal is, what actions were taken, what feedback the environment returned, where the error occurred, and how it was corrected.

Let's use a specific example to illustrate the difference. A user tells a Chatbot, "Help me arrange a business trip from Beijing to Shanghai next Wednesday." The Chatbot directly gives a travel plan, and the interaction ends. The model learns very little from this. It doesn't know if the arrangement is reasonable, if the user is satisfied, and it can't verify if its answer has really solved the problem.

If an Agent is asked to complete the same task, it will go through a complete and autonomous workflow: First, understand the user's business trip needs, query the user's past preferences. When calling the flight interface, it finds that the morning flight is cancelled due to weather and automatically switches to an alternative flight. Then, it filters suitable hotels according to the company's travel standards and generates a draft itinerary. When the user feedbacks that "the hotel is too far from the venue," the Agent will re - select a hotel within walking distance and output the final plan after revision.

Each step carries clear causal signals. A failed interface call tells the model to "reserve alternative plans." The user's preferences tell the model to "remember usage habits." The user's modification feedback tells the model to "iterate and optimize according to needs."

Chatbots only output answers, while Agents truly complete tasks autonomously and continue to grow through continuous trial - and - error and correction.

The information density of this type of data far exceeds that of simple web scraping. It is not a mapping of human language expression but a real - record of the game between the intelligent agent and the real world.

A model trained with this type of data gains not more knowledge but stronger reasoning ability and self - correction ability, which are the key variables determining the upper limit of the large - model's capabilities.

In other words, Agents are the interfaces through which large models obtain evolutionary fuel from the outside world.

Without this interface, the upper limit of the model's capabilities is firmly locked within the boundaries of static data.

IV

Chasing the Ceiling or Piling Up Interfaces?

From the end of 2024 to 2025, there was an interesting divergence in the strategic choices of leading large - model players.

Leading models like OpenAI and Google used maximum pressure to attack the same wall: chasing the ceiling of model capabilities.

At the end of 2024, OpenAI released o3. On the ARC - AGI benchmark test designed by François Chollet - a high - difficulty test recognized as measuring abstract reasoning ability - o3 achieved results that caught the industry's attention. The design philosophy of ARC - AGI is precisely against brute - force: Chollet has always insisted that the core of intelligence is abstract reasoning and few - shot generalization, not brute - force search. But o3 used a large amount of reasoning time to calculate and managed to score far higher than all previous systems on this test.

Chollet's public response was cautious. He didn't deny o3's results but pointed out a key fact: This system consumed far more computing power than humans when solving problems. A high score does not equal a breakthrough in general intelligence.

Google DeepMind continued to promote multi - modal reasoning ability in the Gemini 2.0 series.

But Anthropic chose a different path. In October 2024, Anthropic introduced a function for Claude that seemed unattractive at the time: Computer Use, which allows Claude to directly operate the computer screen. It can see the content on the screen, move the mouse, click buttons, and input text.

The early user experience was not impressive. Claude operated the computer very slowly, often taking a long time to find a button and occasionally clicking on the wrong place. Comments on tech media and social platforms were generally full of good - natured mockery - "Watching AI use a computer is like watching an old person who has just started using a computer."

But Dario Amodei, the CEO of Anthropic, repeatedly emphasized a judgment in multiple interviews:

The next breakthrough in large models lies not only in the number of parameters but also in the way the model interacts with the world.

Amodei served as the vice - president of research at OpenAI for nearly five years, witnessing the evolution from GPT - 2 to GPT - 3. After leaving in 2021, he founded Anthropic with this belief.

At the end of 2024, Anthropic launched the Model Context Protocol (MCP) open protocol, allowing AI models to connect to external tools and data sources in a standardized way.

If Computer Use gave Claude hands and feet, MCP gave it a set of general nerve endings, doubling the surface area of the real world it can reach.

Claude's main narrative in 2025 was not about topping the charts on a certain benchmark but about the engineering implementation of Agent capabilities, including the stability of long - context, the reliability of not dropping the ball in multi - step tasks, and the flexibility of integration with external tools.

It is chasing a more difficult - to - quantify goal: to work continuously and reliably in real tasks.

This may sound less romantic. But Sutton's entire theory tells you that this is exactly the path to a higher intelligence ceiling.

V

Working Is Training

This is the most counter - intuitive phenomenon worth paying attention to in the past year or so. While its peers were directly attacking the capability benchmarks, Claude's large - scale use in real - world Agent scenarios quietly accomplished something predicted by Sutton:

It continuously accumulates high - quality decision - making signals in the interaction with the real world, and these signals in turn become the fuel for improving the model's capabilities.

The operating logic of this flywheel is as follows: Users use Claude to handle real tasks, such as automatically organizing CRM data, completing procurement approvals across systems, adjusting marketing strategies based on real - time data, and completing complex programming projects with Claude Code.

Every success and failure is a signal; every multi - step workflow carries a decision - making trajectory with a causal structure; every result of tool invocation tells the model "this works, that doesn't work."

After being desensitized and refined, these signals will directly affect the model's reasoning depth and self - correction ability.

In contrast, in the Chatbot model, how many of the massive conversations between users and ChatGPT can significantly improve the model's reasoning ability? Interactions like "Help me write a poem about autumn," "Write a quick sort in Python," and "How many provinces are there in China," no matter how many billions of times they are repeated, do not contain signals of causal reasoning. They are repeated predictions of language patterns, not an increase in intelligence.

This is the fundamental difference between Agents and Chatbots at the model evolution level: Chatbots feed the model with the "shadow of language," while Agents feed the model with the "skeleton of decision - making."

This is exactly what Sutton has been talking about for decades: Don't try to directly educate or design intelligence; let intelligence grow in the interaction with the environment.

VI

OpenAI's Shift

OpenAI is not unaware of this problem.

Early on, it continuously explored tool invocation and task execution through a series of features such as Function Calling, Assistants, and GPTs.

But the real leap occurred in January 2025. OpenAI's Operator can autonomously complete tasks in the browser, and then there was Deep Research, an Agent system that can autonomously conduct multi - step research, collect information across websites, and conduct comprehensive analysis.

OpenAI's strategic focus is clearly shifting from "conversation" to "action." This shift itself is a tacit agreement with Sutton's logic: From a system that does pattern matching on static data to a system that makes decisions and learns in a dynamic environment.