15 sentences summarizing Yao Shunyu's first public appearance in person
At the Tencent Cloud AI Second Half Conference on June 5th, Tang Daosheng, the senior executive vice president of Tencent Group, had a dialogue with Yao Shunyu, the chief AI scientist of Tencent.
This conversation revolved around how Tencent understands the second half of AI: As the methodology of large models matures, the focus of competition is shifting from the ability of single-point models to real scenarios, product feedback, context networks, and the engineering implementation of Agents.
The dialogue between Tang Daosheng and Yao Shunyu also revealed that Tencent is reorganizing internally through the co-design of models and products.
We organized all the conversations between the two and summarized them into 15 core viewpoints.
01 About the Second Half of AI
The term "second half" has been overused. The reason Yao Shunyu joined Tencent is that Tencent has "good problems", the kind of good problems that real AI needs
Yao Shunyu said that the concept of the "second half of AI" has been somewhat overused. He believes that in the past few decades, the more important thing in AI was to find good methods. For example, AlphaGo was developed for Go, and specific models were created for translation. However, after the maturity of pre-training and post-training, large models have become a "universal hammer" that can solve various problems. Thus, what has truly become scarce are "good problems". After the model's capabilities become more general, enterprises need to determine where to apply it, what problems to solve, and what value to create.
This is also an important reason why Yao Shunyu joined Tencent. "Tencent has many good problems and many products," Yao Shunyu said. Good products can solve the first problem, which is to determine what scenarios to apply the model to and where its value lies after good pre-training and post-training.
The environment is important, and context is even more important. Sometimes, the competitive barrier comes from having the most original input
Yao Shunyu emphasized the importance of the environment. Without a good environment, Agents cannot do various things. If you don't have a tool for ordering takeout, you can't order takeout.
But the most important thing is context. Yao Shunyu said that for both enterprises and individuals, context is becoming increasingly important. Because models are becoming more and more proficient at transforming a very complex input into an output, and often your competitive barrier comes from whether you have that most original input. Do you know what this person is doing? Do you know all kinds of information about this enterprise? In this regard, Tencent has a very strong advantage.
The most important goal in the second half of AI is to establish a long-term AGI-based organization in China
Yao Shunyu's personal goal is to establish a long-term AGI-based organization in China. He mentioned that today's AI mainly consists of three parts.
First is the foundation part, which is about how to make the most basic things like pre-training and post-training very solid.
The second part is the product, which is about how to use such technology to create value for people and society.
The third is the frontier, which is about how to explore new research paradigms and new opportunities.
Most importantly, it is necessary to build a very balanced organization like a triangle. For the foundation part, the first most important thing is to have sufficient resources, and the second is to have the right way of doing things. For products, having a good sense of products and people who can make products is crucial. Thirdly, there is not enough frontier exploration in China today, so Yao Shunyu hopes to inject more of the spirit of frontier exploration into the organization.
Tencent pursues the co-design of models and products, but Yao Shunyu believes that the premise of everything is still the model
When talking about the co-design frequently mentioned within Tencent, Yao Shunyu believes that the first premise is to make the model itself solid. Pre-training is a matter relatively independent of products. It provides a generalizable foundation that can continuously benefit various downstream tasks.
In terms of post-training, the most important thing is to set up the right evaluation. Yao Shunyu complained that there may be a not-so-good tendency in China, which is to like to top the leaderboards. But what should be more concerned about is how to realistically construct a more real evaluation based on products and real applications. The value of practicality is greater than the value of topping the leaderboards.
In this regard, Tencent has done a lot of work and carried out in-depth co-design with various products. Yao Shunyu said that a key point of co-design is to build mutual trust, and Tencent has done a lot of work to achieve this. There are many details about how to make good use of product data, how to make good use of the feedback, and how to do a good job in evaluation.
Real product feedback can discover problems that benchmarks cannot see
Yao Shunyu does not deny the value of benchmarks, but in comparison, real-world data has at least three types of value.
First, it can discover bottom-line problems that the leaderboards cannot expose. Yao Shunyu said that one of the most important purposes for Tencent to release a preview model is to obtain real-world feedback and fix bottom-line problems that cannot be found in various leaderboards. This will be greatly improved in the official version.
Second, it helps to understand the prompt distribution of real users. Because the problems of real users are often vague, brief, and involve multiple rounds of follow-up questions, while benchmark questions are usually more precise and single-round. For example, the questions on the benchmark may be very precise, with very long and specific descriptions, and generally are single-round questions. But in real scenarios, the questions people ask may be relatively vague, perhaps just one or two sentences, and then they will keep asking follow-up questions. These differences in setup can inspire how to do better training.
Third, the product itself may also inspire new evaluation directions and promote the development of ability areas that have not been well-defined. Yao Shunyu said that one can even get some inspiration from these products to promote leaderboards that do not exist yet or areas that have not been well-defined. For example, Tencent has recently done a lot of work on context learning, and the feedback from Yuanbao has also given great inspiration and help.
02 About Model Generalization
In the name of the model, Tencent's different products finally have a bit of "mutual circulation"
Yao Shunyu pointed out that the fundamental difference between the LLM era and the past AI lies in generalization. In the past, when making a translation model, only translation data was needed. When making a Go program, only Go data was required. But today, even when only developing a Coding Agent, multiple capabilities such as chatting, searching, instruction following, and reasoning are needed.
Therefore, companies with multiple product scenarios will have a systematic advantage. Yao Shunyu said that the co-design with Yuanbao can enable the model to have strong chatting and searching capabilities. And such capabilities can be transferred to other products such as ima and WorkBuddy. So these products can provide different data, and the data can spread and transfer among each other, forming a network-like system. The value of this will become increasingly important.
Previously, Tencent's approach was described by the outside world as a "horse race". Different businesses developed products in the same direction, competing with each other, and there was little sense of synergy. Now, it seems to be changing in the name of AI.
The core changes of Hy3 are to rebuild the infrastructure, redo the data, and rely on a large number of taste-driven decisions
Regarding the Hy3 Preview, Yao Shunyu said that "there is no secret in large models". It is necessary to do a good job in infrastructure and data, while the algorithm part is relatively simple.
He mentioned that Hyunyuan 3 mainly made several changes. One is to rebuild the pre-training and reinforcement learning infrastructure. The second is to make great changes to the data, including defining more real problems, enriching the data taxonomy, and improving the data quality, which is an endless pursuit. Third, many key decisions do not have a clear formula and require continuous trade-offs in recruitment, model rhythm, and resource allocation. In essence, it is a very taste-driven process.
The most difficult part of the cooperation between Yuanbao and Hyunyuan is not technology, but trust
Yao Shunyu revealed that in the early stage of Yuanbao, Hyunyuan sent strong algorithm backbones to help Yuanbao do a good job in the post-training of DeepSeek first. At that time, Hyunyuan's own pre-training model was not ready yet, and many algorithm students did not understand at first.
But Yao Shunyu believes that maintaining a product like Yuanbao and its DAU is very important for subsequent model development and long-term cooperation. So at that time, many students actually did not understand, and he needed to explain very hard. But now it seems that these efforts have paid off. This action made the product and model students realize that the model students really had the product in mind. This played a very important role in the subsequent cooperation, including the successful launch of Hyunyuan on Yuanbao.
Yao Shunyu said that the goals of model development and product development have many aligned parts and also many misaligned parts. Model developers hope that the capabilities are as strong as possible, while product developers hope to meet the users' needs as well as possible. So there are naturally many misaligned parts. An important thing is the ability to think from the other's perspective. Of course, there are many technical parts to discuss, but perhaps the most difficult part is actually how to build trust and how to think from the other's perspective.
The paradigm of product development in the AI era has changed from "pre-made dishes" to open services
Tang Daosheng believes that the first-principle of product development has not changed. Ultimately, it still aims at what users' needs are, how to solve their pain points, and how to create value for users or customers. In different eras, even in different industries, to develop a product, it still needs to bring value to users, and they will then pay for it and use it.
However, there are indeed many differences between developing products in the PC Internet and mobile Internet eras and developing products in the AI era today. First, from the perspective of the paradigm, before the AI era, when developing products, people often thought about meeting users' needs through functions. As a product provider and service provider, one had to figure out what capabilities to provide so that users could choose through the interface or certain menus. This is a bit like pre-made dishes, and users can only choose one from them.
But in the AI era, the open service form of product development will bring very different requirements and challenges. Users interact in simple ways, which may be natural language or voice. As a product provider, you don't know what users will ask, so you need to fully utilize the model's capabilities to understand users' needs. Then, through the reasoning ability and tool-calling ability of today's large models, the product provides various tools that the model can use to meet these open-ended needs.
Tang Daosheng said that developing products in the AI era today requires more comprehensive capabilities and is more difficult. Especially this year, most of the code is generated by AI. Engineers may spend more time on design and architectural design, leaving the code-writing work to AI, and then guiding and correcting it regularly. Testing also needs to be moved forward. One needs to think more in advance about various cases, environments, and requirements for open-ended answers, and even alignment, such as how to align with the style that users need.
Yao Shunyu's doctoral thesis in 2019 predicted today, but he thinks he "didn't think big enough"
Yao Shunyu revealed that he reread his doctoral thesis and felt like he had returned to a very ancient era. The title of his doctoral thesis is "Language Agent: from Next Token Prediction to Digital Automation". It was in 2019, seven years ago, when it was the era of GPT-2. At that time, it could only do next token prediction, and the text it generated might not be very continuous or might have many glitches. So it was very difficult for people to imagine that it would one day become a world-changing force.
At that time, Yao Shunyu had a wild imagination. He thought that GPT was a very beautiful thing, and predicting the next token was a very minimalist and very general thing. He thought that one day its potential would not only lie in predicting the next token but also in automating all things in the world. At that time, he was thinking about digital automation, but now it seems that it might be digital and physical automation.
During his doctoral studies, Yao Shunyu mainly did two parts of work. First, how to establish a methodology for Agents. How to turn a next token prediction machine into an Agent, an automated machine. The most important piece of work might be React.
He still remembers that in July 2022, one night, when he first connected the API of Palm 2 at that time with a Wikipedia API he wrote by himself, and it could answer questions based on the web page for the first time and have multi-round interactions, he felt like a weak filament suddenly lit up. As far as he knows, this might be the first time that humans connected an LLM with the real Internet and carried out such multi-round interactions.
He felt at that time that this might change the world in five or ten years. But it might be faster than expected. When he first proposed SWE-bench, he thought that if this could be achieved, it would obviously bring great value. At that time, it might be tens of billions or hundreds of billions, but now it might be trillions or dozens of trillions. Maybe he still thought too small.
Second, how to define the tasks of digital automation. For example, WebShop is the first web-based web agent task. And InterCode and SWE-bench are the earliest coding agent tasks. Now it seems that the two most important parts of the foundation of Agents are indeed web agents and coding agents.
Yao Shunyu said that when he read the end of his doctoral thesis, which was the future work he wrote in 2024, the first was to train models for agents, the second was safety and robust deployment, the third was scientific discovery, and the fourth was how to help humans. He sighed and said that he was very lucky to be doing the future directions he listed at that time. Maybe he still didn't think big enough. He thought he had thought big enough at that time, but maybe it was still not big enough.
03 About Agents
Agents and Coding Agents have become the basic capabilities of model companies
Yao Shunyu believes that today, Agents, especially Coding Agents, are like pre-training, which is a basic capability that every model company has to have. The essence of Coding Agents is that when a model can control the file system and has a container, it is close to a complete system.
But he also emphasized that to do a good job in Coding Agents, it requires far more than just coding data itself. It also requires comprehensive capabilities such as chatting, searching, and reasoning. Because the most important point of large models is generalization. Tencent's approach will emphasize more on system comprehensiveness, online feedback, and exploration of new paradigms.
Yao Shunyu mentioned that even though Coding Agents may be the most important thing today, Tencent still emphasizes the comprehensiveness of the system. He always believes that to really do a good job in Coding Agents, it actually requires far more than just the data of Coding Agents. It also requires chatting, instruction following, reasoning, and various other things.
Second, the role of products is becoming increasingly important. How to make good use of online feedback is a problem that every model manufacturer is facing and thinking about. Here, the co-design experience accumulated by Tencent will become very important.
Third, more imagination is needed. Whether it is the evolution of technology, the evolution of products, or even the evolution of the next paradigm, Tencent still needs to do some exploratory and even uncertain work.
The core of cost-effectiveness is performance. Doing simple tasks right at one time is more important than the model architecture
Tang Daosheng mentioned that from the product side, there are more and more voices of token anxiety, and the token cost continues to grow explosively. Many customers, even users, including colleagues around, are also closely watching the consumption of points or tokens. How can the model have the highest token efficiency when solving a certain problem or completing a certain task?
Yao Shunyu believes that when people in China discuss cost-effectiveness, they may more often discuss the model architecture. But in fact, it is a very complex system. The most important thing is performance. Many people told him that using a stronger model is sometimes more cost-effective than using a weaker model because you can do the thing right faster and save people's energy. So the most important thing is