Robin Li: Being cheap is no longer an advantage of DeepSeek.
Text by | Zhou Xinyu
Edited by | Su Jianxun
Today, the development of models has become a double - edged sword for application developers. On the one hand, the enhancement of model capabilities brings more possibilities for scenario implementation; on the other hand, the capabilities of applications may also be eliminated by the model capabilities themselves at any time.
What kind of applications will never go out of date?
On April 25, 2025, at the Baidu Create Conference, Robin Li, the founder, chairman, and CEO of Baidu Group, answered: "As long as you find the right scenario, select the right foundation model, and sometimes you may also need to learn some methods of tuning the model, then the applications developed on this basis will never go out of date. It is the applications that truly create value."
In the application layer, Robin Li believes that the most exciting breakthrough applications since 2025 are digital humans and Agents.
Selling digital humans is an important part of Baidu's layout in the e - commerce business. At the conference, Robin Li released the latest capabilities of digital humans: highly persuasive digital humans. In addition to being more anthropomorphic, the characteristic of "highly persuasive digital humans" is that they can sense the environment in the live - streaming room and make intelligent decisions such as giving out red envelopes and changing products.
Another key layout area of Baidu in the application layer is Agents.
A year ago, Robin Li said that AI Coding was the application direction he was most optimistic about. Currently, Baidu has made a relatively complete layout in the field of code intelligent agents. There is Comate, a programming tool for professional engineers, and "Miaoda", a no - code programming tool for ordinary people.
For general scenarios, before, the invitation code for Manus was extremely hard to get. Baidu quickly followed up. When Manus started charging, Baidu launched a mobile - based Agent application called "Xinxiang".
In the model layer, DeepSeek is an unavoidable competitor.
"DeepSeek is not omnipotent." Robin Li said straightforwardly, "DeepSeek does not support multi - modal understanding, has hallucinations, and more importantly, it is slow and expensive."
Targeting the "weaknesses" of DeepSeek, Baidu released new models at the Create Conference: Wenxin Big Model 4.5 Turbo and X1 Turbo, which focus on multi - modality, strong reasoning, and low cost. Especially in terms of cost, the cost of 4.5 Turbo is 40% of that of DeepSeek V3, and the cost of X1 Turbo is 25% of that of DeepSeek.
Finally, Baidu has also set its ambition on the establishment of an AI application ecosystem.
On the one hand, Baidu Search launched an open platform, inviting application developers to develop AI applications for the search ecosystem; on the other hand, Baidu supported the Agent protocol MCP released by Anthropic — which means that models, external tools, and databases that support MCP will be able to interact smoothly.
Baidu's "App - version Manus" is released
The Agent application Manus, released on March 6, 2025, has once again made AI Agents a highly - sought - after application area for various companies.
Three days before the conference, on April 22, Baidu's first independent Agent application was launched on the Android app store. This app called "Xinxiang" can be simply understood as Baidu's mobile - version replication of Manus.
Users only need to enter their requirements in the "Xinxiang" app, and the Agent can execute and deliver the tasks.
"Xinxiang" creates picture books. Source: Baidu
Previously, according to Huang Jizhou, the chief architect of Baidu's intelligent agent business and the person - in - charge of the Xinxiang app, the implementation of "Xinxiang" is based on the Agent Use protocol proposed by Baidu. Previously, the Agent protocol MCP proposed by Anthropic was aimed at tool invocation.
However, "Xinxiang" uses the invocation of intelligent agents. According to users' requirements, the main intelligent agent of "Xinxiang" can schedule third - party and Baidu's own intelligent agents according to tasks to achieve task execution and delivery.
In the view of Li Yuxin, the product manager of the Xinxiang app, rebuilding users' mental models is the biggest difficulty Baidu encountered when developing Agent products.
At the media communication meeting, he mentioned that the user mental model established by Baidu through search before was timely delivery. This means that "AI applications will definitely sacrifice some effects, such as reducing the number of model invocations through caching, etc." — this is also the reason why most Agent products on the market that emphasize timely delivery cannot achieve high - quality delivery.
Li Yuxin believes that what "Xinxiang" needs to rebuild is the hosting mental model. Similar to the task - visible panel of Manus, "Xinxiang" also uses the form of analysis flow during task execution to present the task execution process and time to users.
Currently, Xinxiang already supports more than 200 types of tasks, covering the main scenarios of work, study, and life, such as question explanation, travel, blind dates, medical consultations, and legal consultations.
Huang Jizhou revealed that in the future, "Xinxiang" plans to expand the supported task types to over 100,000. Meanwhile, the PC version of "Xinxiang" is also under development.
The new reasoning model that can draw costs only 25% of DeepSeek
The advantages of Baidu's newly released models, Wenxin 4.5 Turbo and X1 Turbo, compared with DeepSeek V3 and R1, mainly lie in multi - modal capabilities and low cost, in addition to the overall performance improvement.
Performance evaluation of Wenxin 4.5 Turbo. Source: Baidu
Performance evaluation of Wenxin X1 Turbo. Source: Baidu
Among them, Robin Li emphasized the ability of multi - modal understanding. He believes that multi - modality will be the standard for future foundation models. "The market for pure - text models will become smaller and smaller, while the market for multi - modal models will become larger and larger."
Both Wenxin 4.5 Turbo and X1 Turbo support image and video understanding.
For example, when a blurry photo of a football game is input, Wenxin 4.5 Turbo can identify that it is the final between Argentina and England in the 1986 FIFA World Cup in Mexico through elements such as surrounding billboards and players' actions.
The image - understanding ability of Wenxin 4.5 Turbo.
In addition to multi - modal understanding, the two models also support multi - modal generation.
For example, when you input "I heard there is something called 'garlic bird' in Wuhan. Please draw it" in Wenxin X1 Turbo, X1 Turbo can generate a cartoon image of the garlic bird based on the information retrieved from the Internet.
As for the price, the price of Wenxin 4.5 Turbo is only 20% of that of Wenxin 4.5 and 40% of that of DeepSeek V3. The input price per million tokens is 0.8 yuan, and the output price is 3.2 yuan; the price of X1 Turbo is only 25% of that of DeepSeek - R1, with an input price of 1 yuan per million tokens and an output price of 4 yuan.
Baidu's e - commerce: being an upstream "water - seller"
AI has reignited Baidu's confidence in e - commerce.
Since the launch of the "Baidu Preferred" entrance on the Baidu app in May 2023, Baidu's positioning in e - commerce is not to compete with large - scale shelf e - commerce platforms like Taobao and JD.com.
Ping Xiaoli, the vice - president of Baidu and the general manager of Baidu's e - commerce business, said that Baidu's e - commerce has two positions. On the one hand, it is an integral part of the Baidu app's services, meeting the consumption needs of search users; on the other hand, it uses intelligent tool services to become an upstream "water - seller" in the e - commerce industry.
Digital humans are the "water" that Baidu's e - commerce sells. The "highly persuasive digital humans" released by Baidu this time have been optimized in terms of anthropomorphism, cost, style, etc. Most importantly, they can sense the environment in the live - streaming room and interact in real - time, avoiding the embarrassing situation of traditional digital humans playing pre - recorded content repeatedly.
For example, when the number of viewers reaches 500,000, red envelopes will be given to the audience; according to the questions from users in the live - streaming room, PPTs can be flexibly adjusted and materials can be switched.
"Highly persuasive digital humans". Source: Baidu
The technology behind real - time interaction is the multi - intelligent agent scheduling ability. According to Ping Xiaoli, there are multiple intelligent agents such as anchor experts, operation experts, and field - control experts behind the highly persuasive digital humans, which can be flexibly scheduled according to the real - time popularity and conversion rate in the live - streaming room.
The Baidu Wenku with over 97 million monthly active AI users wants to play a combination of models
Baidu Wenku, which has integrated Baidu Netdisk, has presented its half - year report: the number of paid users exceeds 40 million, and the monthly active users exceed 97 million.
Within Baidu, Baidu Wenku is an outstanding example of applying model capabilities. Previously, Wang Ying, the vice - president of Baidu and the person - in - charge of Baidu Wenku and Baidu Netdisk, told "Intelligent Emergence" that Wenku was the earliest AI application to self - develop the MoE (Mixture of Experts) architecture.
Currently, using a multi - model combination as the base has become a common practice for AI applications. Robin Li believes that the combined use of models in applications is a common phenomenon, but how to combine and invoke them is still a technical skill.
For this reason, Baidu Wenku and Netdisk launched a technical base: Cangzhou OS.
Cangzhou OS.
In order to enable different models to understand and generate different contents, this base is mainly divided into two layers:
The first layer is Chatfile Plus. It can perform "vectorization processing" on contents of different modalities, forms, and formats, that is, translate different contents into vectorized tokens that large models can understand and then perform mixed generation.
The second layer consists of "three libraries + three tools", namely "public - domain knowledge library, private - domain knowledge library, and memory library", as well as "editor, reader, and player". This system can be combined and invoked by large models according to users' requirements.
Based on this OS, Baidu Netdisk released a new function, AI Notes.
In the view of Baidu Wenku, a pain point for users in learning is the lack of connection between note contents and original learning materials. For example, when users review based on notes, they need to spend extra effort to find text, video, and picture materials.
The core functions of AI Notes are time tracing and multi - modal organization. For example, based on the video explanations saved in Baidu Netdisk, AI Notes can sort out the logical structure and writing order of the entire video based on content understanding and generate a mind map.
In the mind map, the timestamp of each knowledge point is directly traced back to the corresponding node in the video.
The "AI Notes" function of Baidu Netdisk.
Baidu also connects to the "AI universal socket" with MCP
MCP is an Agent protocol launched by the US model manufacturer Anthropic.
Just as Qin unified the currency, the function of the protocol is to unify the development standards between software. Software that supports the MCP protocol can be more flexibly adapted and called by each other. For example, many financial companies use MCP to enable AI to better understand the context of financial data.
Supporting MCP has also become a "stealth battle" for manufacturers to attract more third - party applications to settle in and establish an AI ecosystem. For example, Alibaba Cloud's AI development platform "Bailian" launched MCP services, and Tencent Cloud also announced that its large - model knowledge engine supports the MCP protocol.
In Robin Li's view, MCP is like installing a universal socket for AI, which can improve the efficiency of adaptation, development, integration, and maintenance of different AI software. For Agents that need to freely call tools, the emergence of MCP is particularly important, which means that Agents can freely call third - party tools that support MCP.
Currently, Baidu Smart Cloud's large - model platform "Qianfan" is compatible with MCP. Baidu Search has also built an index platform for MCP Server. Applications such as Wenxin Kuaima, Baidu e - commerce, maps, Netdisk, and Wenku also provide capabilities externally in the form of MCP Server.