Li Yanhong's internal speech: "Wen Xiaoyan" doesn't need to be promoted as aggressively as Doubao and Kimi. | Exclusive from 36Kr.
Written by Zhou Xinyu
Edited by Su Jianxun and Yang Xuan
"Intelligent Emergence" exclusively learned that recently, Baidu held the director meeting for the third quarter of 2024, attended by Robin Li, the founder, chairman and CEO of Baidu Group, and managers above the director level within Baidu.
The third quarter is also known as Baidu's strategic quarter. In Robin Li's nearly ten-thousand-word speech, AI remains the protagonist. Centered on AI, Robin Li mentioned the development strategies of businesses such as search, digital humans, intelligent agents, large model invocation, and Apollo Go.
More importantly, in the current situation where AI resources are scarce and it is still early for generating revenue, Baidu also needs to make trade-offs in the development strategies of its businesses. For example, Robin Li mentioned that Baidu will not engage in video generation like Sora, as "it may be difficult to commercialize it in 10 or 20 years".
He also pointed out that for the ToB business, try not to touch the thankless project-based model and try to launch standardized products. And for the ToC new search application "Wen Xiaoyan" after the rename of Wenxin Yiyan, its promotion will not be as aggressive as that of Doubao of ByteDance and Kimi of Dark Side of the Moon.
The following is the sorting and summary of Robin Li's speech at Baidu's director meeting for the third quarter of 2024 by "Intelligent Emergence":
Search: The combination of search and large models is in the direction of intelligent agents
Robin Li believes that the progress of search reconstruction is relatively slow, but he also understands that the business has a long historical evolution cycle, and it is difficult for employees' concepts to be adjusted and transformed in a short period of time.
At present, he believes that the combination of search and large models should be more on intelligent agents. In his opinion, the intelligent agent is not yet an industry consensus, but just a bet by Baidu, "We believe that the intelligent agent will be a new carrier of content, services, and information in the future, and even the main carrier."
But based on this assumption, Robin Li believes that the intelligent agent, like the previous video stream or graphic and text stream content ecosystem, will face the same problem: How do you distribute it?
In his opinion, the solution to this problem is that search is still a main channel, "Because the form of the intelligent agent is a conversation flow. If it relies on the up and down sliding operation, it is impossible to combine and interact with the intelligent agent."
Regarding the development trend of the intelligent agent, Robin Li judges that as the basic model becomes more powerful, the threshold of the intelligent agent will become lower and lower; but at the same time, the intelligent agent can also raise the ceiling very high, because the technologies such as self-reflection, evolution, and tool use of the intelligent agent are still in the very early stage, and the group collaboration of multiple intelligent agents has not yet seen actual implementation in the industry.
Therefore, Robin Li believes that if we deeply study and judge in various scenarios, there are still many things that the intelligent agent can construct, and the imagination still exists.
He also mentioned that the intelligent agent has initially verified its commercial value, "Today, we have hundreds of thousands of advertisers, and tens of thousands of them have been trying to access the commercial intelligent agent to make its advertising effect better, the conversion rate better, and then better reach and communicate with these target customers, and they are willing to pay in real money."
Based on the new understanding of search, Robin Li believes that the reconstruction and reestablishment of search should be divided into two steps:
Search and recommendation integration:
Any changes in search should not only consider the impact on the core business indicators in the search scenario, but also the impact on the core business indicators of Feed.
The essence of the empty box recommendation (such as some words and sentences preset in the Baidu search box) is a kind of recommendation, because this word is not input by the user, and only by using the concept of recommendation is the correct solution.
AI should be combined with the mobile ecosystem:
AI needs to be further integrated with the mobile ecosystem, such as how to combine the image user interface and the natural language interface more naturally, which will be a paradigm for the future development of search.
Digital Humans: The future mainstream product form is the interaction between real humans and virtual humans
In Robin Li's view, the mainstream interaction form in the PC and mobile Internet era is the interaction between real humans, and the representative product is WeChat.
"It is difficult for us to imagine what practical value the interaction between virtual humans has," Robin Li mentioned, "So I think the value lies in the interaction between real humans and virtual humans."
But he also admitted that the Use Case of the interaction between real humans and virtual humans requires a certain process of exploration, and this process will be accompanied by pain - since this year, Baidu has been facing such a pain, because the effect of virtual humans has not been so good, and forcing it at this time will damage the user experience.
However, Robin Li is still optimistic that the progress of technology will force the improvement of products, and the interaction experience between virtual humans and real humans in the future may even exceed the interaction between real humans.
Specific to the landing scenarios of virtual humans, Robin Li mentioned live streaming. He believes that e-commerce live streaming this year is a very mainstream product form, and he will consider whether the characteristics and capabilities of big Vs like Dong Yuhui and Xin Ba can be replicated by digital humans, "There is still some imagination space in this."
Robin Li gave an example of a feasible scenario: A large part of Baidu's e-commerce live streaming is already digital human live streaming, The script is completely generated by AI. In reality, there is a lot of lengthy data that real anchors may not be able to fully remember, but digital humans have no problem with memory, and are even better than real humans.
In addition to AI script generation, Robin Li believes that interaction is another relatively important scenario, even though this is still difficult for digital humans.
In virtual humans, Robin Li also sees more product forms. In addition to live streaming, there is also video. He mentioned that Digital human live streaming should be benchmarked against the real human capabilities in videos, rather than the real human capabilities in live streaming, because in theory, digital humans should have undergone a lot of training and polishing, just like many high-quality mainstream videos that have also undergone repeated polishing and reshooting.
Intelligent Agents: More synthetic data will be used for training in the future
Robin Li believes that digital human live streaming and intelligent agents are of the same origin, because digital human live streaming has its own knowledge base and basic elements such as workflow. When the intelligent agent and the multi-modal technology are combined, it may be the evolution direction of digital humans in the future.
He mentioned that Baidu began to attach importance to the technological development of intelligent agents since the fourth quarter of 2023, and this year, the field of intelligent agents is becoming more and more popular. He observed that the o1 model released by OpenAI is built based on reinforcement learning, representing OpenAI's expectations for intelligent agents, indicating that the training paradigm has returned from Transformer to reinforcement learning, which means that a good reward model needs to be designed.
Currently, there are more and more doubts about the Scaling Law, but Robin Li believes that in the Chinese market environment, in fact, many valuable data have not been truly applied to training, such as live streaming-related data and multi-modal data.
In his opinion, reinforcement learning, like the Scaling Law, faces the bottleneck of computing power and data, More training data in the future will be supplemented by synthetic data, and it needs to be synthesized based on the specific understanding of the technology or the scenario.
Robin Li judges that in the future, intelligent agents can greatly improve the work efficiency of humans, but to release the potential of intelligent agents, many skills are still needed. In March 2023, Robin Li once mentioned that 50% of human work is still prompt engineering in the end. Now, he still holds the same view. Polishing the prompts is one of the skills to release the potential of the intelligent agent.
Furthermore, Robin Li mentioned that there is a very important concept in the context of the intelligent agent, called "workflow". Workflow is simply "a routine". If the routine can be clearly disassembled, then it becomes a workflow, and in the future, AI and machines can automate it.
"Most of the methodologies in the world today have not actually been digitized." Robin Li believes that there is still a lot of value to be released.
The Invocation of Large Models: AGI is Baidu's long-term goal
Robin Li mentioned that there are some new consensuses in the invocation of large models now, For example, the small models distilled from large models are very competitive in small models and have stronger capabilities than small models trained from scratch.
At present, Robin Li attaches more importance to the invocation volume of API, because the greater the invocation volume, the more feedback, and the ability to improve the basic model can be enhanced, which also represents the market's recognition of Baidu's basic model capabilities.
At the same time, Robin Li also mentioned some non-consensus judgments:
The quality of API invocation is more important than data. If only the invocation quantity is emphasized, it is easy to cause cheating;
The overall effect of the large model is still better than the fine-tuned small model. If the requirements for response speed and reasoning cost are very high, it may be more suitable to use the fine-tuned small model; but if the time is not sensitive and the effect is desired to be good, the large model should be used.
He also emphasized that AGI is Baidu's long-term goal. Robin Li believes that AGI cannot be achieved within half a year or one year.
Therefore, in the development process, Baidu needs to make trade-offs. Robin Li mentioned that in the short term, the large model still needs to be optimized for the scenario. He does not pursue a unified, general, and ranked large model, but wants to see whether Baidu's model surpasses the competitors in the application scenarios, and whether it really achieves better results and higher efficiency than what humans do.
Regarding the team organization form, he believes that Baidu needs internal and external collaboration, such as sharing R & D resources and jointly bearing the R & D costs, and it is necessary to determine to lead in the core scenarios selected by Baidu, rather than pursuing a completely general and powerful version.
Apollo Go: The data flywheel should be as simple as possible
Robin Li believes that Apollo Go has been at the forefront of the world.
He mentioned that there has always been a debate about the L4 technical route: One is the so-called end-to-end pure vision insisted by Tesla, and the other is the rule-based route. In his opinion, both routes have their own reasons, and it mainly depends on who can achieve L4 first.
For example, whether Tesla's solution can achieve fully unmanned driving in Wuhan today, or whether it can surpass the hierarchical end-to-end in two to three years. Robin Li believes that it can be achieved in two or three decades, but where the inflection point is determines what technical route Baidu will adopt to solve this problem now.
In response to the topic of Apollo Go replacing human drivers in July 2024, Robin Li also shared two thoughts:
In history, the industrial revolution has actually been subverting the hardest jobs at the bottom and creating some more comfortable and elegant jobs. For example, today no one carries sedan chairs anymore, and there are no coachmen either.
Robin Li believes that innovation is to replace the hardest jobs and transfer the labor force to less hard jobs. In general, the progress of technology is relatively positive.
In the era of artificial intelligence, and even in the opportunities brought by AI and large models, it should also include the opportunities for organizational innovation and process innovation.
Robin Li believes that the data flywheel is a necessary and sufficient condition for the success of AI-native applications, but there are still many things that people do not understand clearly:
For example, the data flywheel should be based on the know-how or data in a specific field, but is the business process really continuously generating the knowledge and data in this field? Are you consciously doing this? Robin Li believes that the industry awareness in this aspect is not so strong.
At the same time, he believes that The data flywheel should be as simple as possible. Because the more nodes there are, the slower and more complicated it will turn, and the scale of each flywheel is not large. So Robin Li hopes that less design is more, and simplicity is complexity.
Resource Allocation: The basic model should be at least half a generation ahead of its peers in China
Robin Li believes that Baidu's current resources are mainly focused on making the strong points stronger. As for the timing of filling the shortcomings, it is when not filling the shortcomings will make the strong points impossible.
In his opinion, the key points of Baidu's resource allocation at present are as follows:
The basic model should be at least half a generation ahead of its peers in China. Baidu will continue to invest in the basic model;
In key scenarios, it should surpass the competitors, and be able to create value for the business and products, rather than indiscriminately improving the general capabilities of the basic model;
In the new round of organizational adjustment, HCG (Health Care Group) has been merged into MEG (Mobile Ecosystem Group), so the overall efficiency will be higher;
The content ecosystem should be built to be stable and distinctive enough to be able to do many other things. For example, the intelligent agent can be said to be "poetry and the distance", but for now, it is necessary to do a good job in the user mindset, let the creators know what benefits they can get from creating or submitting content on Baidu, or let the users perceive what kind of content they can see on Baidu, or what kind of content Baidu is good at.
Among them, Robin Li particularly mentioned the plan for ACG (Intelligent Cloud Business Group):
First, the ToB business must be standardized. Standardization corresponds to the project-based model. The project-based model has many requirements, requires a lot of on-site personnel to be dispatched, and requires a lot of back-end R & D and transformation.
For standardized products like Comate, although they cannot sell for much money now and are not competitive enough, Robin Li thinks it doesn't matter. It is acceptable that the starting point of this product is relatively low, because as long as continuous investment is made to raise its threshold and widen the gap with competitors, it will still be a good direction in the future.
Secondly, ACG should focus on mid-range customers. Robin Li believes that for those very large customers that can be used as benchmarks, it is often difficult to make much money from them; it is also not easy to do for the very long-tail ones - because they do not have much money in hand.
Trade-offs: Don't do video generation, no need for aggressive promotion
Finally, Robin Li mentioned strategic trade-offs, which is also a summary of the content of the director meeting. First are the four "takes":
Continue to insist on investing in the training of the next generation of models;
Continue to build the ecosystem of intelligent agents, although this is not yet an industry consensus;
Develop the intelligent cloud with API invocation as the traction;
Apollo Go should continue to expand in scale.
Finally, there are three "sacrifices":
The investment cycle of video generation like Sora is too long, and it may not be possible to obtain business benefits in 10 or 20 years. Therefore, no matter how popular it is, Baidu will not do it;