HomeArticle

What gives Baidu the confidence to lead AI towards a future where it is "capable of everything and present everywhere"?

晓曦2025-04-25 18:38
The foundation of super productivity lies in the operating system.

In the domestic AI field, a covert battle around MCP has quietly emerged.

There will no longer be a large - scale bubble like the "Hundred - Model War". Instead, there is a greater focus on the long - term value of whether AI can be effectively implemented and put to good use. When it comes to ecological barriers, these are the common understandings among MCP participants.

Today, technology giants are gearing up. Their goal is not only to gain an early advantage in this "new battlefield" but also to jointly help accelerate the evolution of AI applications.

Regarded as the "universal plug" for intelligent agents, MCP not only relies on a framework - based mindset to become a link for the in - depth integration of multimodal applications and various data sources. It can also truly build an open ecosystem, enabling every enterprise and developer to create their own applications, providing users with high - quality and personalized deliverables, and significantly enhancing AI productivity.

Based on this, on April 25th, at the Create2025 Baidu AI Developers Conference, Robin Li, the founder, chairman, and CEO of Baidu, announced that Baidu Wenku and Baidu Netdisk jointly launched "Cangzhou OS". This is also the world's first operating system in the content field, which has precipitated AI capabilities into a system - level technical foundation. Based on the characteristics of the OS and the value of MCP, it truly realizes the transition from in - depth thinking to in - depth delivery.

If data is the energy in the AI era and models are the productivity engines, the OS system is like a "super factory" that connects technology, data, tools, and end - to - end delivery requirements, allowing users to further feel that in the era of large models, AI is moving towards the direction of being "omnipotent and ubiquitous".

01

The correct direction for AI is to be "omnipotent and ubiquitous"

The year 2024 is known as the first year of AI applications. The multimodal content produced by many AI applications is basically "usable". Most users have stopped observing from the sidelines and started to look for AI delivery scenarios that meet their own needs.

However, "usability" is not the ultimate goal. The AI era still needs to move to a further stage - users need more "practical" and "user - friendly" AI. How can this be achieved?

Before answering this question, it is worth considering what the pain points are for AI applications being "not user - friendly" enough.

Firstly, large models have become very popular and have fully entered the stage of in - depth thinking. AI can offer strategies for us, but there are still very few cases where it can truly do the work and achieve delivery.

Looking at the AI applications on the market, most still lack the ability to connect context and call across platforms. In a closed environment, it greatly tests users' patience in multi - round conversations and the "generation probability". The uneven quality control also makes users reluctant to use it in more professional and complex delivery scenarios.

Secondly, from ChatGPT to DeepSeek, the threshold for users to input prompts has been significantly simplified, but it still depends on users' own logical expression and data organization, resulting in a lot of input burden and input costs.

Thirdly, before there is a mature solution for multimodal input and output, AI can only think "smartly" but cannot "do the work". Users are limited by the single - point function of different AI applications in handling material modalities and often need to constantly switch applications and platforms, and their input and output thinking is frequently interrupted. The upper limits of the capabilities of AI Agents such as Manus are obviously restricted in this regard.

Facing these clear challenges, improving model capabilities no longer directly corresponds to a leap in AI delivery capabilities. Baidu Wenku and Baidu Netdisk have also realized that users' needs are not to learn how to use AI, but to stably obtain high - quality delivery results without "learning about AI or precisely mastering the structure of prompts". Moreover, being able to quickly generate multimodal - formatted content end - to - end at any time, any place, and on any terminal by inputting any instructions or multimodal files is in line with the public's psychological expectation of AI being "smart and capable".

It is at this critical juncture from quantitative change to qualitative change that "Cangzhou OS" points out the correct direction: to make AI move towards being "omnipotent and ubiquitous".

02

A good system requires end - to - end high - quality delivery

Under the "connection" value system of MCP, to achieve end - to - end delivery in every application scenario, value innovation in each link of technology, tools, and services is indispensable.

Therefore, end - to - end delivery means a one - stop, system - level complete generation experience. Just as after the emergence of the Windows system, users no longer need to participate in coding to make the computer work, to enable the large user groups of Baidu Wenku and Baidu Netdisk to "equally" use AI to do work, a set of AI - native operating systems is also needed.

For example, the Free Canvas jointly launched by Baidu Wenku and Baidu Netdisk last year is an operating system that changes the interaction mode of Chatbots, enables multimodal understanding and generation, and multi - task parallel collaboration. It was born to lower the threshold of AI and can be regarded as a beta version of "Cangzhou OS".

Robin Li introduced at Create2025: "The ability to use multiple models in combination on the Free Canvas has now been precipitated into a complete technical foundation - Cangzhou OS. This is an operating system born for content."

The problem - solving idea of "Cangzhou OS" is simpler and more direct: to make AI achieve system - level evolution. That is to say, when users input full - modal materials and instructions into this system, through more flexible interaction and operation, they can directly produce full - modal and deliverable content. This is no longer a single engineering - oriented idea but an end - to - end system.

An end - to - end system should at least meet three points: first, there should be no input threshold for users; second, the toolchain and context memory should be complete and open; third, the content of multimodal mixed input and output should be accurate and rich.

Once such an operation scenario is verified, it will quickly challenge and iterate the existing OS productivity system. To surpass the traditionally defined operating systems, the operating systems in the AI era need to make breakthroughs in three aspects: more personalized content, more convenient interaction, and more comprehensive tools.

The three - layer architecture of "Cangzhou OS" corresponds to these aspects one by one.

At the infrastructure level, "Cangzhou OS" has built "three major libraries". This is also based on the public knowledge base of Baidu Wenku accumulated over the years, the private knowledge base authorized by users of Baidu Netdisk, and users' memory libraries, as well as its powerful knowledge processing and extraction capabilities. It enables users to obtain and call knowledge data without any threshold during tasks.

The public and private knowledge bases are the barriers for Baidu Wenku and Netdisk because the massive knowledge accumulated in Wenku can assist in reasoning, making the multimodal output results more professional and reliable, while the knowledge in Netdisk makes the generated content more in line with users' personalized needs and inspirations.

In the central system, to bridge the "gap" in efficiency scenarios, input and output, production and collaboration must be highly integrated and easy to operate. This is also the scenario where all future OS systems and AI terminals will focus their efforts. Therefore, "Cangzhou OS" has built "three major devices", which not only include the self - developed readers, editors, and players with integrated AI capabilities of Baidu Wenku and Baidu Netdisk, but also can, through the "scheduling center", combine users' memory and portrait data, fully understand users' intentions through interaction components, intention models, and transmission infrastructure, and achieve parallel collaboration and efficient scheduling of multiple models and multiple intelligent agents.

In terms of application services, returning to the nature of the operating system itself, based on the MCP protocol, "Cangzhou OS" integrates hundreds of AI Agents from Wenku and Netdisk. The generated modalities cover various types of materials such as pictures, charts, documents, audio, and video, comprehensively covering diverse practical scenarios such as study, work, life, and entertainment, and also has the ability to expand flexibly.

Compared with PC and mobile operating systems, "Cangzhou OS" well reflects the value characteristics of an AI OS, allowing the personalization of data and the diversity and accuracy of models brought by MoE to flow into various scenarios and terminals, exploring the true meaning of "ubiquity".

At the same time, combined with the large - scale public and private knowledge data and hundreds of AI capabilities that have been long - term verified by users, "Cangzhou OS" can meet users' general and segmented demand scenarios, and thus has the opportunity to move towards being "omnipotent".

03

Being smart and capable of doing work is the only way to verify AI

Robin Li publicly announced at Create2025 that the number of paid users of Baidu Wenku's AI functions has exceeded 40 million, and the monthly active users have reached 97 million, making it a real "super productivity".

Then, after having an "OS" system, how can this system empower such a large - scale user base?

Based on "Cangzhou OS", Baidu Wenku and Baidu Netdisk have launched two new capabilities, "GenFlow Super Partner" and "AI Notes". These two capabilities also show that on the AI OS system, the experience of AI functions and the AI interaction interface can flow flexibly and powerfully like water, exploring the possibility of AI being "omnipotent and ubiquitous" in different application forms.

For example, the GenFlow Super Partner in the Baidu Wenku APP is a comprehensive iteration of WorkFlow in terms of the principle of human - machine collaboration. WorkFlow is generally used to refer to a pre - defined workflow, which is fixed and non - flexible; while the logic of GenFlow is to autonomously call various models and Agents such as PPTs, documents, mind maps, and posters through AI's thinking and planning, and finally output multimodal content.

In many "unexpected" scenarios, GenFlow has shown a deep understanding of simple colloquial requests. It can do the work quickly and beautifully, making users, who originally just wanted to give it a try, pleasantly surprised to find that the results can be directly used for delivery.

For example, when inputting "I'm going to hold a wedding in Hainan during the May Day holiday. Help me make a wedding plan with pictures and texts and a wedding invitation" into GenFlow, this simple colloquial instruction is complex for AI in terms of understanding the workflow of the demand: a complete wedding plan not only requires practical solutions, a large amount of user preferences, and customized content but also involves the production of materials such as invitations.

It can be seen that GenFlow intelligently combines local customs, venue, and time characteristics, actively confirms users' preferences, budgets, and processes through multi - round conversations and reviewing historical records, and analyzes which multimodal output methods are needed to present to users through model reasoning. These are the reasoning processes of GenFlow "thinking like a human". At the multimodal output level, it will call PPT tools, poster design tools, etc., and can simultaneously generate high - quality wedding plan PPTs and invitation posters within a few minutes. Moreover, users can directly edit the generated content in the operating system.

At the same time, compared with the multi - intelligent agent collaboration products on the market, GenFlow Super Partner is not only an "off - the - shelf" product but also can achieve minute - level delivery. It has higher stability in terms of generation quality and overcomes the defect that similar products cannot be optimized through multi - round conversations.

The second is the AI Notes in Baidu Netdisk, which is also the first multimodal AI note - taking product on the market at present. The starting point of Baidu Netdisk's thinking is to explore what a "good note" is. Users' requirements for notes are nothing more than comprehensive and accurate information, structured knowledge presentation, complete logic, the ability to precipitate and reuse key knowledge, and the in - depth integration of notes and learning materials for easy review and revision.

Most of users' current learning materials are in multiple modalities such as videos, images, and texts. The note - taking products on the market are difficult to meet the above requirements of users at the same time. Baidu Netdisk has fully identified this pain point, fully activated users' private learning knowledge bases, and enabled a smooth connection between learning content and notes.

For example, when watching learning videos stored in Netdisk, users can automatically generate comprehensive, clear, and structured multimodal AI notes through the AI Notes function in the sidebar of the Netdisk playback interface, which are completely associated with the video content. Users can also generate an AI mind map with one click, overview the video structure, and generate AI - based questions based on the video content to test their knowledge mastery. In the future, users can also independently add other knowledge content such as textbooks and materials to the notes and conduct an AI - based full - network search based on the knowledge to generate more detailed and complete AI notes.

These two capabilities are actually just the tip of the iceberg of the massive capabilities of Baidu Wenku and Netdisk. As more enterprises adopt MCP and join system ecosystems such as Cangzhou OS to build their own AI applications and Agents, more single - point capabilities will emerge.

04

Beyond capabilities, the long - term value of an open ecosystem

Allowing more enterprises and developers to join is also the key for the entire AI industry to sell the "bigger cake".

Therefore, to maximize the value of the ecosystem and applications, based on "Cangzhou OS", Baidu Wenku and Baidu Netdisk have taken the lead in fully applying MCP to the connection between products and the ecosystem, building a three - layer system of MCP Server - Client - Host, and opening up the capabilities of Wenku and Netdisk in the form of MCP Server for more enterprises and developers to use.

From the fact that Baidu Wenku and Netdisk quickly turn their Server into tools, it can be seen that in the AI field where architectural innovation is based on the MCP protocol, it is more of a competitive - cooperative relationship than a fierce battle.

Enterprise - level application cooperation will not only occur in the digital world. For example, Samsung, as a hardware manufacturer, has embraced the value of MCP in the content consumption end.

Currently, Samsung mobile phones are also accessing multiple MCP Servers of Baidu Wenku and Netdisk, such as file upload, download, retrieval, and content understanding. After the access, Samsung users can directly implement functions such as uploading files to the Netdisk for backup, cloud sharing, document summary, and content Q&A on the voice assistant interface of their mobile phones.

On the other hand, these Servers are also enriching the cloud storage capabilities of the Samsung mobile phone system, solving the pain points of the hardware itself in terms of batch backup and sharing of large files and multiple files. For example, the "view - and - save" function for picture and audio - video files and batch file delivery will have the opportunity to be easily realized on Samsung mobile phones in the future.

At the same time, in the field of IoT devices, Baidu Netdisk has also reached a cooperation with Niu Tingting, a leading brand of