HomeArticle

The AI scientist who loves milk tea the most wants to create the "intelligent agent" that understands you best.

富充2025-11-24 15:57
From teaching in China after returning from abroad, starting a business to collaborating on projects with large companies, every step of Wu Yi's journey not only represents some groundbreaking personal judgments but also serves as a microcosm of the current era of Chinese AI entrepreneurs.

Text by | Fu Chong

Edited by | Su Jianxun

Whether in school research or in collaborative projects with large companies like Ant Group, Wu Yi hopes his team can maintain an entrepreneurial mindset: not afraid of making mistakes and able to iterate quickly.

As an assistant professor at the Institute for Interdisciplinary Information Sciences of Tsinghua University and the person in charge of the AReaL project, Wu Yi focuses on reinforcement learning algorithms and AI application innovation. In May 2025, his Tsinghua team and Ant Research Institute jointly open - sourced the first asynchronous reinforcement learning training framework, AReaL - lite, which can significantly improve AI training efficiency and reduce GPU waste.

As a post - 90s technology leader, Wu Yi requires his team to "grow through trial and error." The excuse he dislikes the most now is "I don't have the resources, so I can't do the work." Because the essence of starting from scratch is to create resources.

At the Bund Summit in September this year, Wu Yi's product philosophy also reflected this: Once a product is made, release it quickly. Even if the market feedback is not good, you need to know where the problems are and make corresponding improvements. Don't wait for a perfect start.

This understanding of innovation stems from Wu Yi's previous entrepreneurial experience. In 2023, his team founded Frontier Technology, an AI Agent company based on reinforcement learning, which is also the predecessor of AReaL.

Due to their similar backgrounds and research experiences in the AI field, Wu Yi, Chen Jianyu (founder of Xingdong Jiyuan), Gao Yang (co - founder of Qianxun Intelligence), and Xu Huazhe (chief scientist of Xinghaitu), four AI scholars who studied in the United States, are collectively known as the "Four from Berkeley."

Few people know that Wu Yi was the first among the four to decide to return to China. It was his suggestion and promotion that led to the return of the other three.

Wu Yi likes to do pioneering things. At Tsinghua University, he often tells his students that innovation means going into the uncharted territory. He firmly believes that AI innovation cannot rely on multi - point layout as a "gamble." Instead, it should stem from in - depth thinking and long - term perseverance.

He has a unique judgment on the future of AI: Intelligent agents will surely be able to understand humans' vague intentions, complete long - term tasks, and ultimately move from the digital world to the physical world, becoming the "brain" of embodied intelligence.

In his speech at this year's WAIC, he gave an example: In the future, you only need to tell a robot to "tidy up the room," and it will spend several hours to complete the task properly.

Regarding this goal, Wu Yi believes that the reinforcement learning training method he is engaged in will be the key to significantly improving the intelligence level of AI.

Because the characteristic of reinforcement learning training is to allow AI to learn independently in practice and develop the ability to explore. In contrast, previous supervised learning required humans to constantly tell AI how to work, and this method is difficult to apply to long - term tasks.

△ After attending the robotics academic conference IROS in Hangzhou, Wu Yi posted on Xiaohongshu. In the photo, he was holding a milk tea and smiling happily. Photo provided by the interviewer

Wu Yi, who is rigorous in his professional field, shows another side on social media.

This self - proclaimed "high - energy introverted PhD supervisor" often shares his research progress on Xiaohongshu and is also willing to answer questions about AI job hunting and development.

Because he likes to drink milk tea, Wu Yi not only carefully selects his top 5 milk tea flavors but also takes photos at his favorite milk tea brands.

△ Wu Yi likes milk tea. The recruitment information he posted on Xiaohongshu also features a cup of milk tea in the picture. Photo: Screenshot from the Internet

Recently, Wu Yi was interviewed by "Intelligent Emergence" and shared many thoughts on the future of AI and entrepreneurship, including methods to help him make quick decisions and improve team efficiency. The content is organized by the author:

The Future of AI Lies in Smart Agents

Intelligent Emergence: Currently, there are no widely popularized AI applications. Where do you think the future opportunities for AI products lie, and how will they serve the public's lives?

Wu Yi: I think it is an irreversible trend for AI to be able to complete long - term tasks. In addition, people's commands to AI will become simpler and more ambiguous.

It's still hard to say what the final product form will be, but there will eventually be a change in AI products from "users need to actively drive AI" to "AI anticipates what users want and completes the task in advance."

This kind of change has already happened in the mobile Internet. For example, in the era of search engines, people actively searched for information when they had needs. Then there was Zhihu, and later various products from ByteDance. Algorithms can push the content users want to them, allowing users to passively receive information.

So I think people will gradually forget about the active search dialog box. Smart AI can serve the needs of "lazy" people more and more.

Ultimately, a brand - new product will surely emerge, which represents a great opportunity in this era.

Intelligent Emergence: You mentioned in events like WAIC that when an agent has a physical body, it becomes an embodied agent, which can interact with the physical world. In short, this is an AI robot. What kind of work can an embodied agent do?

Wu Yi: A smart embodied agent can accurately infer users' intentions based on vague instructions, complete tasks with high quality, and even proactively consider users' needs that they haven't even realized.

For example, if you tell a robot at home that you can't find your power bank, it will reason and act on its own, and help you find it based on your usage habits and the location where you last used it in its memory.

Intelligent Emergence: Can smart embodied agents collaborate with each other? How do multiple embodied agents cooperate?

Wu Yi: Embodied agents can cooperate with each other to complete more complex tasks.

For example, in a robot football team, just like human players, when faced with a trained situation, the robots can understand what formation to form just by a glance at each other.

If there are multiple smart and useful agents, the next step is to define how they communicate with each other.

In the digital world, agents may communicate in a way that a master agent drives many small agents. You can use different models, or you can use one model, but the structure is like one person making continuous plans, and many people executing around the plans simultaneously. This is the so - called Multi - Agent System.

I often use the cooperation between Claude Code and Gemini as an example.

Claude Code has strong coding abilities, but it has a short context and high cost. While Gemini is less capable, it can handle a large amount of content. So you can let Gemini read the entire code base first, filter out the most critical content, and then hand it over to Claude Code to write the code.

It's like a smart but physically weak person collaborating with a person with infinite physical strength but less intelligence. Together, they form an efficient multi - agent system.

In the scenario where embodied agents work, for example, when several robots need to clean a space together. After "communicating," they will have a task plan, with some responsible for sweeping the floor and others for mopping. They will cooperate to complete the task.

Intelligent Emergence: How can we transition from digital agents to embodied agents in the physical world?

Wu Yi: The transition from the digital world to the physical world requires multi - modal data, and the training environment also moves from the computer to the real world.

In the digital world, the tools used are basically bits, which have a high execution success rate. So basically, you can write a piece of code to execute the corresponding function, and the certainty is relatively high. Of course, writing code itself is not easy.

However, when using tools in the physical world, for example, carrying a bag and opening a door, the error rate of robots in executing such tasks is still quite high at present. Therefore, the development of embodied intelligence will be more complex and slower.

However, from a macro - perspective and in the long - term development, if one day the underlying physical world of agents has been digitized to a large extent, the core technical challenges of various agents will ultimately be unified.

For example, if we really have a machine that can successfully call most physical world tools 100% of the time, then constructing an embodied agent that can operate autonomously for a whole day on this basis is actually not much different technically from an agent in the bits world.

△ A photo of Wu Yi and his Berkeley - era tutor Stuart Russell at this year's WAIC. Photo provided by the interviewer

AI Innovation Cannot Rely on a "Gamble"

Intelligent Emergence: You interned at ByteDance, your team founded Frontier Technology, and then you chose to cooperate with large companies to promote reinforcement learning technology. Looking back on this journey, what are your thoughts?

Wu Yi: In the early days of the Frontier Technology team, we made many mistakes in personnel selection. At that time, many employees came to work with a regular - job mindset and didn't realize what entrepreneurship meant. Objectively speaking, the whole team was not really ready and didn't quite meet the entrepreneurial spirit of the AI era. Of course, it was everyone's first time, and making mistakes was inevitable.

One sentence I dislike very much now is, "I don't have the resources, so I can't do something." Entrepreneurial teams don't have abundant conditions. People create resources to achieve their goals.

Therefore, entrepreneurial teams actually need people with the spark of innovation and the corresponding awareness.

There is no such thing as a "gamble" in innovation. Entrepreneurship requires firm belief in what you are doing. We don't have enough resources to bet on different tracks and hope that one will succeed in the future. This will result in many mediocre solutions.

The entrepreneurial spirit is that I firmly believe that even if I can't achieve something, it is the right thing to do, and it will be realized one day, even if not by me.

Intelligent Emergence: You were the first among the "Four from Berkeley" (referring to Wu Yi, Gao Yang, Xu Huazhe, and Chen Jianyu, four young scholars who graduated from the University of California, Berkeley, and are currently active in the fields of AI and embodied intelligence) to decide to return to Tsinghua University to teach, and then you led the others to return to China. Why?

Wu Yi: In August 2018, I finished my internship at ByteDance in Beijing. Although I got my PhD at Berkeley, I was actually quite influenced by ByteDance.

Since 2016, I have intermittently interned in different teams at ByteDance in Beijing. I was also one of the earliest members of ByteDance's AI Lab and happened to witness the end of China's mobile Internet era. After finishing my last internship at ByteDance in August 2018, I decided to return to China.

On the one hand, I felt the huge development opportunities in China. On the other hand, I clearly felt the glass ceiling for Chinese people in the United States. Unless you become an American, there is a fundamental question: If you want to make an impact, do you want to be a Chinese or an American? I found that I didn't want to compromise and become an American.

When facing choices, many people say, "I'm not ready now. I'll do it when I'm ready in the future." For example, regarding returning to China, some people say, "I'll develop in the United States for a while and then return to China in a few years."

But I have a theory: If you are sure you want to do something in the future, the best time was in the past, and the second - best time is now. So I thought, why not just choose to return to China.

What to do after returning to China? After thinking for a month, I declined ByteDance's return offer. In October 2018, I knocked on Mr. Yao's office door and chose to return to Tsinghua University as a teacher.

Then I shared my thoughts with some of my Berkeley classmates at that time and told them to come back quickly because there were opportunities. My idea was simple. When I saw good opportunities, I wanted to share them with others, and it really influenced some people.

Looking back now, that time was indeed a good timing for returning to China. As early - returning scholars, we did enjoy some benefits.

Intelligent Emergence: It seems that you always take on challenges, learn while adjusting, and then make progress. For example, you first chose a major you didn't like in your PhD and then switched to reinforcement learning. Among the scholars who returned to China at the same time, you seemed to be the first to start a business. When your classmates started to do business, you chose to cooperate with large companies. Does your experience sound like a reinforcement learning process?

Wu Yi: Yes, I've been through a reinforcement learning process all the way, making mistakes one after another. I've managed to quickly make all the mistakes I could think of. Haha, I feel that learning through making mistakes is more profound and has better generalization ability than SFT (Supervised Fine - Tuning).

Making a product is similar. I often say that once a product is made, it should be released quickly. In the AI era, even good wine is afraid of being in a deep alley. You need to quickly bring the product to the market for people to use and get feedback. Even if the market feedback is negative, you'll know where the problems are and can quickly iterate through trial and error.

Of course, I also want to tell everyone that if you have high - quality SFT data, then reinforcement learning can be more efficient. Because the exploration in reinforcement learning with negative feedback is quite consuming. So I hope to share my experiences and views with everyone to help them progress faster.

Intelligent Emergence: Pioneering opportunities often mean there isn't much reference experience. How do you convince yourself to make a decision?

Wu Yi: When I need to make a decision, I have a quick - decision - making method: I first flip a coin. Before the coin lands, I already know the answer in my heart.

I'm always the one who flips the coin first.

Intelligent Emergence: For you, is it more important to do what you want or to have the glory? Would you be willing to achieve great results while remaining anonymous?

Wu Yi: Yes, I would.

I've thought about this question: If I can build a good startup from scratch, and then the company enters the stage of scaling up from 1 to 100 and the organization grows rapidly, and I'm no longer the most prominent manager. Can I accept this? The answer is yes.

At that turning point, I'll probably introduce professional managers and start another project from scratch. The reason is simple. Scaling up from 1 to 10 or even 1 to 100 often requires the collaboration of hundreds of people, and such large - scale management isn't what I enjoy the most.

However, I'm also reflecting on whether I'm restricted by this idealistic state. Maybe when that time comes, I'll make