HomeArticle

Academician ZHANG Yaqin: In the end, there will be no more than 10 foundational large models. In ten years, there will be more robots than humans.

量子位2025-12-11 17:59
The Agent Internet is the next stop for AI.

From ChatGPT to DeepSeek, AI is entering a new wave along the path of "Intelligence +".

At the critical juncture when large models are shifting from "computing power stacking" to "inference priority", Zhang Yaqin, the founding dean of the Institute for AI Industry Research (AIR) at Tsinghua University and a foreign academician of the Chinese Academy of Engineering, proposed:

The new round of artificial intelligence is the integration of information intelligence, physical intelligence, and biological intelligence. In essence, it is also the integration of atoms, molecules, and bits.

That is to say, on the premise that the scaling law continues to work, when the parameter scale, data volume, and computing power resources cross a certain threshold, intelligence will no longer stay only in pattern recognition but start to "emerge" -

First, it moves from discriminative AI to generative AI, and then from generative AI to a new paradigm represented by agents.

At the QbitAI MEET2026 Intelligent Future Conference, he also regarded ChatGPT and DeepSeek as two important milestones in this round of evolution:

The former incorporates data such as text, voice, images, proteins, and point clouds into the same space through unified representation and tokenization;

The latter pushes large models from the "pre - training era" to the "DeepSeek Moment" centered on inference with high efficiency, high performance, low price, and an open - source path.

As for the main battlefield in the next 5 - 10 years, in his view, it will move towards the "Internet of Agents Era" -

The number of basic large models will converge to no more than 10 globally, similar to operating systems; and agents will replace most of today's SaaS and apps, becoming the default form for enterprises and individuals to interact with the world. At the same time, this is also the inevitable path to AGI.

To fully present Zhang Yaqin's thoughts, QbitAI edited and organized the speech content without changing the original meaning, hoping to bring you more inspiration.

The MEET2026 Intelligent Future Conference is an industry summit hosted by QbitAI. Nearly 30 industry representatives participated in the discussion. There were nearly 1,500 offline participants and over 3.5 million online live - stream viewers, attracting extensive attention and reports from mainstream media.

Summary of Core Views

The new round of artificial intelligence is the integrated evolution of information intelligence, physical intelligence, and biological intelligence driven by unified token representation and the scaling law.

Represented by ChatGPT and DeepSeek, AI is moving from discriminative to generative and inferential, and is accelerating implementation in an environment of high efficiency, low cost, and an open - source ecosystem.

Generative AI is rapidly evolving into agents, with the task length and capabilities increasing simultaneously, and the risks amplifying as well.

Basic large models are equivalent to the operating systems in the AI era. Globally, there will be no more than ten players, and a new industrial pattern of "basic models + vertical/edge models + agent networks" will be reconstructed.

The Internet of Agents is the biggest direction in the next 5 - 10 years and the inevitable path to AGI. It is expected to complete the leap from information intelligence to physical intelligence and then to biological intelligence in 15 - 20 years.

...

The following is the full text of Zhang Yaqin's speech:

From ChatGPT to DeepSeek: The New Round of AI Paradigm and "Intelligence Emergence"

Today, I'm talking about the trend of AI +. I've been thinking about this topic for nearly ten years. I wrote a book called Intelligence Emergence, which also summarizes the development of AI in the past decade, including the articles on "Intelligence +" and "AI +" I wrote ten years ago.

First of all, the new round of artificial intelligence is the integration of information intelligence, physical intelligence, and biological intelligence. Our information world, physical world, and biological world are all moving towards digitalization. So, it is also the integration of atoms, molecules, and bits.

Artificial intelligence has a history of 70 years, and it has developed especially rapidly in the past five and ten years. An important milestone is ChatGPT in 2022, which is exactly three years ago.

What's particularly important is that ChatGPT has brought about the shift from discriminative AI to generative AI. In the past, it was more about pattern recognition. Now, we can create new content. There are three particularly important concepts here:

First, it is a unified representation, that is, tokenization. Whether it's text, voice, pictures, videos, proteins, DNA, cells, or 3D lidar point - cloud signals, they can all be turned into a unified token.

The more tokens, the more data, the stronger the computing power, and the better the algorithm, the more accurate it will be.

When the scaling law reaches a certain scale, intelligence emergence will occur, which is also the title of my book. Another important moment is the DeepSeek Moment in China. After DeepSeek emerged, it first made the entire model shift from pre - training to inference.

Another important point is that it has a lot of algorithmic innovations, architectural innovations, and system innovations. It can be said to be high - efficiency, high - performance, and low - price.

At the same time, it is a new business model. It is open - source, using the MIT license, which is an open - source architecture with the fewest restrictions. So, after DeepSeek emerged, it significantly accelerated its implementation and application both in China and globally. Therefore, I call it the DeepSeek Moment, which is also a Chinese path.

Five Trends in AI Development

There are five trends in AI development. First, generative AI is moving towards agent - based AI.

Agents are a new development in AI in the past two years and the most important innovation.

In the past seven months, the task length of agents has tripled, and the accuracy rate is greater than 50%, which is actually on par with humans.

The second important trend is that the scaling law has slowed down in the pre - training stage. Although intelligence is still advancing, more intelligence is being placed in the post - training, inference, and agent stages.

An interesting phenomenon here is that the unit cost of inference has decreased by 10 times in the past year, but the computing power requirements of agents themselves have also increased by 10 times in a year. So, one is multiplied by 10 and the other is divided by 10, which just balances out.

Third, we are moving from information intelligence to physical intelligence and biological intelligence, and large language models are moving towards vision - language - action models (VLA, Vision - Language - Action).

There are two important nodes here. One is self - driving. Self - driving has reached the ChatGPT Moment this year, and the DeepSeek Moment will be in 2030 - that is, about 10% of new cars will have L4 - level self - driving capabilities.

Another robotics is definitely the biggest track in the future. Although humanoid robots still need many years to develop, I think that in about 10 years, the number of robots will exceed that of humans.

The bad news is that the risks of AI are rising rapidly. After the emergence of agents, our risks at least double.

The Biggest Development Direction in the Next 5 - 10 Years: The Internet of Agents

If we look at the new industrial pattern, we have basic large models, which are like operating systems, with vertical models and edge models on top.

Globally, the number of basic large models may not exceed 10 in the end. Half may be from China, half from the United States, and maybe one or two from other countries. China and the United States have different paths but are leading globally.

This also includes open - source and closed - source. I remember last year we were still arguing about whether to focus on open - source or closed - source. Now, it's clearer - open - source will become a larger and more extensive platform and ecosystem. Maybe 80% will be open - source, and 20% will be closed - source.

This graph clearly shows that the pre - training stage of the scaling law is still rising, but the rate has flattened. Post - training is increasing, and the development of agents is rising linearly.

Another point is that agents themselves are not only a technology. They are actually forming networks and a new economic form.

So, if we look at the enterprise architecture in the future, it will be a completely different concept. In the future, enterprises will need GPUs, large models, data, and human resources. Some may be human, and some may be agents. This will have a great impact on future enterprise management and product development.

This graph shows the future technical architecture. The left - hand graph was drawn by me shortly after ChatGPT emerged. If you look at the yellow line, on the left, you can see that the basic large model is a platform, with various vertical models in different fields on top, SaaS (Software as a Service) above that, and a relatively small model distilled at the edge, and then apps on top. This was the architecture I envisioned at that time.

In October, I updated this architecture. The most important point is that I believe in the future, our SaaS and mobile apps on the device or edge will be replaced by agents. That is to say, agents are the future SaaS and apps.

For example, there can be various types of agents, including consumer - oriented ones, those for different industries, for robots, and for self - driving.

Professor Liu Yang at Tsinghua University developed a medical agent, which is also the world's first agent - based unmanned hospital. The goal is to use agent networks and multi - agents to simulate a real top - tier hospital, which of course includes patients, doctors, nurses, and various departments.

In this virtual world, multi - agents will interact and learn from each other and then continuously and rapidly evolve. In a short period, say two days, it can complete the cases that a top - tier hospital would handle in two to three years, and with much higher accuracy.

However, we are not saying that agents will replace future doctors. They are more like assistants. We believe that every doctor will have their own agent in the future.

So, if we look at the industrial opportunities, I regard basic large models as the operating systems in our AI era. They will completely rewrite, reconstruct, and reshape our industrial forms.

Just as we had Windows in the PC era and Android and iOS in the mobile era, in the AI era, the operating system is a basic large model. After having this operating system, the underlying chip architecture and the upper - layer application ecosystem will all be completely reconstructed.

That's why current chips are mainly GPU - based, and the ecosystem above is mainly composed of vertical models, edge models, and agents. The overall scale will be one, two, or even three orders of magnitude larger than that of the PC era and the mobile era.

From the perspective of Internet development, we started with PC interconnection, then moved to mobile interconnection, and then to the Internet of Things. Now, we are moving towards the Internet of Agents. I believe it is the biggest development direction in the next 5 to 10 years.

Agents are also the inevitable path to achieving AGI (Artificial General Intelligence). This requires a new algorithm system, such as a new memory system and world model.

I think that in the next five years, the current autoregressive architecture, Transformer, and Diffusion may be subverted. With these things, we can achieve general artificial intelligence.

How long will it take exactly?

I think it may take 15 - 20 years. First, it will be information intelligence, then physical intelligence, and finally biological intelligence.

This article is from