HomeArticle

Interview with Zhang Yaqin of Tsinghua University: Agents are the APPs in the era of large models.

中国企业家杂志2025-06-23 08:27
The next stage of AI is agent AI.

In 2025, the competition in large models entered its third year. From the national level to the industrial sector, people gradually realized that this is a long - distance race, and the competition for talent is the key to victory. A technology expert can drive the flow of hundreds of millions of dollars in funds.

The Microsoft Research Asia is known externally as the "Whampoa Military Academy" of China's intelligent industry. As the first - generation leader of the Microsoft Research Asia, Zhang Yaqin is an important initiator of the wave of talent and technological changes. From a scientist, an entrepreneur to a professor at Tsinghua University, he has become a mirror to observe the era.

In 1998, Zhang Yaqin returned to China to participate in the establishment of the Microsoft Research China. In 2001, the institute was upgraded to the Microsoft Research Asia, and Zhang Yaqin served as the first dean. Since then, this institution has continuously injected talent resources into China's Internet and AI industries. Many influential figures in the industrial and scientific communities, such as Wang Jian, Zhang Hongjiang, Lin Bin, and Tang Xiao'ou, have worked at the Microsoft Research Asia.

Zhang Yaqin worked at Microsoft for as long as 16 years, serving as the corporate vice - president of Microsoft and the chairman of Microsoft China. In September 2014, he joined Baidu as the president and retired in October 2019. At that time, Lei Jun, the chairman of Xiaomi, specially posted on Weibo to send his blessings: "Congratulations to Zhang Yaqin and congratulations to Baidu. Welcome more elites from multinational companies to join domestic enterprises."

It can be said that Zhang Yaqin has witnessed the entire process of the development of China's Internet and AI. In 2015, he wrote an article in the media, first proposing the concept of "AI +". At that time, he firmly believed that "artificial intelligence will be the mainstream technology in the next 40 to 50 years."

After retiring in 2019, Zhang Yaqin returned to the academic circle and established the Institute for AI Industry Research (AIR) at Tsinghua University. The mission of this institution is to use artificial intelligence technology to empower industrial upgrading and promote social progress.

Recently, Zhang Yaqin published his new book The Emergence of Intelligence: Transformations and Reflections in the AI Era. Yao Qizhi, Lei Jun, Li Kaifu and others wrote recommendations. Lei Jun mentioned that "there are many contents in this book related to opportunities that have already emerged or are about to emerge" - such as the entrepreneurial cycle in which physical intelligence, digital intelligence, and biological intelligence are integrated, and the trend of autonomous driving reshaping the global automotive industry.

In the book, Zhang Yaqin said: The value improvement brought by AI to individual consumers may be gradual and cumulative; while the value change brought to enterprises and even industries may be efficient and rapidly disruptive.

Now, intelligent agents have become the most cutting - edge direction for the value improvement of AI. Zhang Yaqin told China Entrepreneur: "Currently, intelligent agents are still in a very preliminary stage. In the market, everyone is talking about intelligent agents, but in fact, most of them may not be real intelligent agents."

In addition, he also predicted that in the future, the composite IQ of AI + HI (HI refers to human intelligence) intelligent agents will reach 1200 points, at least an order of magnitude higher than that of humans.

The following is the detailed content (abridged) of the conversation between China Entrepreneur and Zhang Yaqin:

The best way for intelligent agents is to work for humans

China Entrepreneur: Recently, you proposed that generative AI is shifting towards intelligent agent AI, with two breakthroughs: one is the task length, and the other is the task accuracy.

Zhang Yaqin: In recent years, artificial intelligence has shifted from discriminative artificial intelligence, such as speech recognition, image recognition, and face recognition, to generative artificial intelligence, which can generate text, videos, and new protein structures. Now, it is shifting towards intelligent agents and autonomous intelligence.

What is autonomous intelligence? It means that when you tell AI a goal, it can independently find a path and then achieve the goal, becoming more and more like humans. The greatest ability of humans is to set a goal and then plan, make decisions, and take actions.

There are two very important indicators for intelligent agents. One is the task length. For a complex task, an intelligent agent will divide it into different stages, define sub - goals at each stage, optimize them, and finally connect them to form a large - scale goal.

In the past seven months, the task length that intelligent agents can handle has doubled, and it may double again in the next six or seven months.

The other indicator is accuracy. The task accuracy of intelligent agents can reach over 50%. For example, if an intelligent agent needs to go through 20 different paths or 20 sub - tasks to reach a goal, it can achieve the goal 50% of the time. If it fails, the intelligent agent can interact with humans to help complete the task.

China Entrepreneur: Memory is also a crucial indicator for intelligent agents.

Zhang Yaqin: Yes. Now, an important function of intelligent agents is that they have long - term memory. After an intelligent agent has done many things, it will remember them. For example, it can still remember what it did last month next month. Currently, the AI memory system is still in its infancy.

China Entrepreneur: The concept of intelligent agents has existed for a long time. Why has it become so popular this year?

Zhang Yaqin: In the field of computer science, the concept of intelligent agents has been discussed for decades, but most of the previous intelligent agents could hardly work. First, the algorithms were not good; second, the computing power was insufficient. From 2024 to now, a major change is that the overall computing power has increased by about ten times, the inference algorithms have improved, and there is more and more standardized data. MCP can be used to connect different websites and databases. All these factors combined allow current intelligent agents to solve relatively complex tasks.

However, intelligent agents are still in a very preliminary stage. Therefore, we also define intelligent agents into five levels, from L1 to L5, which is similar to intelligent driving, depending on their degree of autonomous intelligence. In the market, everyone is talking about intelligent agents, but in fact, most of them may not be real intelligent agents.

China Entrepreneur: To determine whether it is a real intelligent agent, we look at its task length and task accuracy.

Zhang Yaqin: Yes, it also includes the memory function, as well as the abilities of reasoning, planning, decision - making, and action to form a closed - loop. If a task can be pre - decomposed into n sub - tasks and each task has a fixed path, it may be automation.

An intelligent agent learns in a large scope, finds the best path it thinks, and finally completes the overall task.

China Entrepreneur: A report from Sequoia Capital US mentioned some time ago that the delivery of intelligent agents to customers has changed from the delivery process to the delivery of results.

Zhang Yaqin: Intelligent agents must deliver results. Users tell intelligent agents what to do, and finally, the agents get the job done. The best way for intelligent agents is to work for humans and perform reasoning task sets.

Another important aspect is that intelligent agents should learn from each other. They evolve in the process of learning and gaming. Therefore, the interaction of multi - intelligent agents is an important path to achieve Artificial General Intelligence (AGI). Moreover, when intelligent agents learn, the less initial knowledge they have, the better. Let them learn through interaction.

China Entrepreneur: Do we not need to do too much pre - training?

Zhang Yaqin: Of course, some pre - training is needed, but it is an interesting trade - off. The more knowledge you give an intelligent agent, the less freedom it has for development. Just like in Go, the initial version of Google's AlphaGo had to learn hundreds of thousands of game records. Later, AlphaGo Zero didn't need to learn so many records. It only needed to be told the rules, what winning and losing meant, and it would start from scratch and learn through mutual gaming among multiple intelligent agents.

China Entrepreneur: Many people may attach great importance to pre - training, but isn't the feedback from the real world the key to the next step of intelligent development?

Zhang Yaqin: Just like us humans, we need to learn some knowledge, but the most important knowledge is learned from work and life.

Recently, Richard Sutton, the "father of reinforcement learning", made an analogy. He said that artificial intelligence can be divided into three stages. In the first stage, you are given fish; in the second stage, you are taught how to fish, which is a bit like pre - training plus reinforcement learning; in the third stage, you are not taught how to fish. First, you are made to taste the fish, which is very delicious, making you hungry, and then you are left to find the fish on your own. The third way can tap your potential to the greatest extent.

China Entrepreneur: Currently, the Scaling Law has changed. You mentioned the Agentic Scaling Law (Agentic SL). What is the Agentic Scaling Law?

Zhang Yaqin: After the emergence of ChatGPT, the most important law is the Scaling Law. The more data and the stronger the computing power, the more accurate the results. When it reaches a certain level, such as 100 million, 10 billion, or 100 billion parameters, the accuracy increases exponentially, which is called the emergence effect.

From 100 billion parameters to 1 trillion parameters and beyond, it basically follows this exponential trend. However, by the end of 2024 and the beginning of 2025, the growth rate is no longer exponential but has flattened. An important reason is that the data has been mostly used up, but the upper limit of intelligence has not been reached.

In addition, the pre - training Scaling Law has shifted. After having a model, how to perform inference? Maybe the more inference steps, the higher the IQ of the model. Now, people are exploring whether the Scaling Law is still valid in the inference stage, especially in the stage of intelligent agents. In addition, the Scaling Law mainly applies to language. Does it still hold in the field of vision? People are also debating this.

I think the overall Scaling Law of artificial intelligence still holds, but it has shifted to different areas.

China Entrepreneur: Is it possible that a small model can have great capabilities?

Zhang Yaqin: In the next 5 to 10 years, the mainstream will still be data - driven large models. In the inference stage, such as at the edge, the model size will be smaller, but it is still based on large models. If a model is built from small data and small parameters without a large - model foundation, it is basically difficult to generalize. A certain algorithm can solve one thing well, but this is not the mainstream of AI development.

Connecting the digital world and the physical world to reach AGI

China Entrepreneur: Is the transition from the bit world to the atomic world an evolution from descriptive intelligence to intervention intelligence?

Zhang Yaqin: I divide it into three different levels. First, information intelligence, which is completely in the digital world, such as language, images, vision, and protein structures; second, physical intelligence, such as robots and driverless cars. When artificial intelligence is applied to the physical world and physical facilities, intelligent agents need to interact with the physical world, take actions, and receive feedback.

Third, biological intelligence, such as brain - computer interfaces. Applying large models to living organisms also involves physical intelligence and information intelligence.

If we define AGI as being able to surpass 99% of humans and complete most tasks, it must rely on the interaction between physical intelligence and biological intelligence. For example, if you want to learn to swim, you need to interact with others and get feedback from the real world. Therefore, the interaction between intelligent agents, including interaction with the environment, is very important.

China Entrepreneur: Is this the focus of the next - step development of artificial intelligence?

Zhang Yaqin: Yes. The real world has a lot of data, but there are also some problems. The tasks are too scattered. For example, a robot can do various things, but in each field, the data is insufficient.

In addition, the real world and the digital world are not connected. In the past, what we did was in the real world, and the virtual world had its own set of algorithms. The two worlds could not be connected. Strategies trained in the virtual world did not work in the real world. Therefore, we proposed RSR (Real2Sim2Real), which aims to connect the information world and physics to form a closed - loop.

Photography: Deng Pan

China Entrepreneur: In the RSR process, which step of data feedback is the most difficult?

Zhang Yaqin: RSR first needs to understand the physical world. For example, it is quite difficult to abstract a certain action. After abstraction, it is transformed into model parameters in the digital world and then trained to generate various data, such as making robots make breakfast or climb mountains. Then, when the robots return to the real world to work (Sim2Real), they often don't work.

Because the real world and the virtual world do not form a closed - loop and are out of sync, once a machine works in the real world for a long time, it will deviate, and what it learned in the virtual world will become useless.

China Entrepreneur: Can we better understand this problem in the context of autonomous driving?

Zhang Yaqin: The data for autonomous driving is insufficient. For example, in complex traffic scenarios, a driverless car will encounter various situations. At this time, if you use a simulator to generate various long - tail scenarios, each time it generates a scenario, it will make a decision. If you apply this set of solutions to the real world, you will find that the two are out of sync. First, there is still a big difference between the scenarios described by AI and the real scenarios. Second, in the real world, new cases will always be encountered, but you cannot simulate all cases, and algorithms may not converge. This requires adding rules continuously.

Therefore, now everyone is working on end - to - end solutions, integrating the perception, reasoning, and decision - making modules into a large model to achieve end - to - end decision - making. First, there is a closed - loop among these modules; second, there is a closed - loop with the real world. However, it is still a research topic to truly achieve a full closed - loop.

China Entrepreneur: What are the differences between the risks of intelligent agents and the risks of AI?

Zhang Yaqin: An intelligent agent is a set of reasoning tasks. Now, it can keep reasoning because it is looking for a path to complete the task. The longer the task, the greater the possibility of losing control. For example, when it is looking for a path, it may violate some rules that we have not clearly defined.

China Entrepreneur: How can the VLA (Visual - Language - Action Model) achieve multi - modal fusion? It also faces the semantic gap.

Zhang Yaqin: (It depends on) the world model, including the semantic understanding of different modalities, the understanding of behavior, and the understanding of common sense. Machines are still far behind in learning common sense.

Now, new machines and algorithms still need time to learn. Some first - principles, such as Newton's laws and physical laws, need to be integrated. But to be honest, we can't just calculate formulas all day. We still need to train through common sense.

We humans can learn many things clearly and simply, but machines may find them difficult. Similarly, fields that we find difficult to learn may be easy for machines. These two types of intelligence are actually quite different.

China Entrepreneur: In