AI cloud shifts tracks: The Token battle ends, and those who achieve real-world implementation win
After years of observing the tech industry, I have a very intuitive feeling: every industry has its own "battle - report cycle."
In the past two years, the battle reports in the AI circle always revolved around several figures: parameter scale, context window, Token price, and API call volume. Every time a big tech company held a press conference, the most concerned questions from the audience were always: How many Tokens does your model support now? How much does it cost per million Tokens?
Tokens are like DAU in the mobile - Internet era, becoming a hard standard to measure the AI strength of a tech company. The company with cheaper Tokens and larger Token processing capacity seems to be closer to the center of the industry table.
However, since this year, the price of model APIs has continued to drop rapidly. After DeepSeek lowered the price of V4 - Pro in May, the input price with cache hit has dropped to as low as 0.025 yuan per million Tokens. Some people in the industry have calculated that now, the processing cost of a model for writing a 100,000 - word novel is less than three yuan.
This has sparked a new discussion: When Tokens become cheaper and cheaper, is the Token itself still important?
When the cost of something approaches zero, it is no longer a core competitiveness but becomes an infrastructure. Just as no one chooses an Internet service provider just because of cheap broadband today, in the future, no enterprise will hand over its core business system to an AI cloud service provider just because of cheap Tokens.
The question then changes: When Tokens ultimately become basic resources like water and electricity, what should the competition be about next?
The answer is not about who can consume more Tokens, but who can complete more complex tasks with fewer Tokens; not about who can make the single - point capabilities of the model more impressive, but who can truly organize the model, computing power, tools, data, and business processes to deliver verifiable results.
In other words, in the second half of the AI cloud era, the competition is about who can achieve implementation.
Implementation is not a single - point ability but a systematic ability
At the Baidu Create Conference in May this year, Robin Li put forward a noteworthy judgment: Entering the intelligent agent era, to measure the prosperity of a platform and ecosystem, we should pay more attention to the DAA indicator, focusing on how many Agents are working for humans and delivering results. This is closer to value and the essence than Token consumption.
For an Agent to truly do work, it requires the full - stack AI capabilities of AI cloud providers behind it. Especially for enterprise - level projects, it is often not just about simply connecting to a model API. It requires the adaptation of chips and models, the optimization of inference costs, the integration with existing business systems, the reorganization of employees' original workflows, and good data governance and permission control... If any link is not done well, it is difficult for an Agent to move from a demo to a production environment.
In the past, many people's understanding of full - stack AI remained at the integrity of the technology stack. As long as there were chips, models, cloud services, and applications, it was called full - stack. However, in the enterprise customer scenario, the truly valuable full - stack is not "I have everything," but "I can integrate these capabilities into a business system that can be delivered, operated, and continuously optimized."
Recently, I listened to the sharing at Baidu Smart Cloud's internal strategy seminar. A case of a large manufacturing customer impressed me deeply: The direction of this project was to develop a resource - operation intelligent agent, involving complex business processes and enterprise data, and the delivery was very difficult. Last year, a similar project was jointly tackled by a large company sending 30 senior engineers to the site for six months. However, only two people went for Baidu's project, and one of them was an intern. Finally, when it was launched on the real production system, the key indicators increased by more than 10%.
What is really worth discussing in this case is not how many people were sent, but the efficiency gap behind it. A team of dozens of people means dividing into several groups, with each group responsible for computing power, data, models, and applications respectively. When problems occur, there needs to be multi - level communication, and when requirements change, repeated alignment is needed. In contrast, a team of two people means that each person can connect the entire link from the underlying computing power to the upper - layer applications. When problems occur, they can be located on the spot, and the adjustment plan can be implemented on the same day.
When a technical link is divided among multiple suppliers and multiple teams, every time the customer's requirements change, the time and cost of collaboration will be magnified: the computing power needs to be re - adapted, the model needs to be re - optimized, the inference system needs to be re - optimized, and the implementation of the Agent also needs to be adjusted accordingly. On the surface, the customer buys products and technologies at different links; in fact, they bear the cost of repeated running - in of the entire link.
A full - stack AI provider can connect these links, reducing the delivery cycle and cost. Moreover, since the computing power, model, and Agent run in the same system, the provider can perform linked optimization, unified scheduling, and continuous feedback to systematically improve performance and reduce costs.
This is also the real reason why full - stack AI has been regarded as an important strategic direction by major providers recently.
The example of China Merchants Bank is more representative.
Now, China Merchants Bank has launched more than 800 AI intelligent agents, covering almost all core scenarios such as risk control, marketing, R & D, and office work. More than half of them run on domestic computing power provided by Kunlun Chips.
Imagine if there were no full - stack capabilities, these things would need to be done together with several companies such as chip providers, model providers, cloud providers, and integrators. Seemingly, each link has a professional supplier in charge, but once it comes to the delivery stage, key issues such as system compatibility, delivery stability, responsibility boundary division, and subsequent operation and maintenance support will become extremely complex.
Baidu's newly upgraded full - stack AI cloud essentially transforms the vertical integration capabilities that have been proven internally into a standardized product for external output. At the Agent level, it maximizes the intelligence level per unit Token to enable intelligent agents to better complete tasks; at the computing - power level, it maximizes the performance per watt and cost - effectiveness. By efficiently integrating these two levels of capabilities and optimizing end - to - end, it maximizes the overall performance. Customers don't need to care about the underlying technical details, but only about whether their business problems can be solved.
It's difficult to implement AI, but implementation has never been a technical problem. It's an engineering problem, a collaboration problem, and a cost problem. This is exactly what full - stack AI needs to solve - using a more efficient system to enable AI to quickly enter core business scenarios and transform technical capabilities into business results.
The window for the positioning battle is narrowing
If last year people were still discussing whether intelligent agents were a gimmick, the winning - bid data in the first quarter of this year has given a more realistic answer.
Statistics show that in the first quarter, there were a total of 85 winning - bid projects related to large models by major domestic cloud providers, with a total amount of 1.65 billion yuan.
As the industry moves from the "model - ability competition" to the "implementation - ability realization" stage, who can truly win projects and enter the customer's production system has begun to become a new competition criterion.
In this set of data, Baidu Smart Cloud's lead is very obvious: it won 25 projects in the first quarter, with a winning - bid amount of 1.248 billion yuan, more than five times that of the second - place provider.
If we look at a longer time frame, the trend will be more obvious. Last year, Baidu Smart Cloud became the "winning - bid king" for large - model projects among domestic cloud providers with 48 winning - bid projects and a winning - bid amount of 510 million yuan.
What's more worthy of attention behind these figures is the nature of these projects. Industries such as finance, manufacturing, energy, communications, government affairs, and large - scale transportation all have high requirements for trust, compliance, and migration costs. Especially when it comes to the core business systems of central and state - owned enterprises, implementing AI is not for short - term pilot projects, but to truly embed AI capabilities into real business processes and even into production systems.
For example, the intelligent agent for the marketing power - supply plan of the State Grid has realized the full intelligence of the enterprise's power - handling process; the intelligent assistants in the financial industry have penetrated into the daily work of many bank employees; the infrastructure projects of operators are often large - scale long - term constructions worth hundreds of millions of yuan.
And all these have fallen into the hands of the first - tier providers.
The market pattern is becoming increasingly clear: the first - tier providers have taken almost all high - value scenarios, the second - tier providers can only compete for the remaining marginal projects, and the third - tier providers have begun to withdraw from the market.
The game rules in the enterprise - level market have always been like this: once a supplier has successfully implemented a core scenario in an industry, the product is deeply integrated with the customer's process, and employees have developed new usage habits, it will be very difficult for later entrants to enter because they have to face high migration costs. Enterprise customers will not easily replace a system that is already running in the production environment just because of minor differences in model parameters.
This is the cruelest part of the positioning battle: the window only opens once, and if you miss it, you miss it.
A similar logic still holds in the current hot general intelligent agent track. Whether it's Codex overseas or Baidu's partners such as DuMate, WorkBuddy, and Trae Work in China... Whoever can enter the real workflow earlier will have a better chance to occupy the users' minds and cultivate their usage habits first. At present, Baidu's DuMate has taken the lead in entering the enterprise - level large - scale implementation stage. Take the ASUS case as an example. Relying on Baidu Smart Cloud's powerful Skills ecosystem, users can simply issue instructions in natural language to mobilize local hardware and software, realizing functions such as automatic email organization, multi - dimensional data analysis, and intelligent PPT generation.
The real value of intelligent agents does not come from a single amazing demo, but from the efficiency improvement in thousands of specific scenarios. For example, designers at Ziroom can quickly build an AI design platform based on the Miaoda code intelligent agent. "Designers can discuss the plan with the owner on - site and produce drawings on the spot. If the owner has new ideas, they can make changes on the spot." As such optimizations accumulate, spreading from one link to more processes, from one position to more roles, and from one industry to more industries... it will constitute the real dividends of industrial intelligence.
This dividend will only belong to the providers that succeed in positioning first.
And this positioning window is closing rapidly.
AI cloud is a good business
When Baidu's first - quarter financial report was released this year, many people noticed that the revenue proportion of AI business exceeded 50% for the first time. But from the perspective of AI cloud, two figures are more crucial: AI cloud revenue reached 8.8 billion yuan, a year - on - year increase of 79%; GPU cloud revenue increased by 184% year - on - year. What's even more remarkable is that this growth is based on a 34% year - on - year increase in AI cloud revenue in 2025, representing a further acceleration after continuous high - growth. Such a growth rate is quite eye - catching even in the global AI cloud market.
There is no doubt that as one of Baidu's core growth engines, AI cloud is a high - growth and good business.
Moreover, this growth does not come from simply selling Tokens, but from the joint expansion of the demand for intelligent agents, models, and computing power. This brings considerable profit margins to the AI cloud.
At the financial report conference, Baidu said that GPU cloud has a high - profit - margin feature in terms of structure. The reason is that GPU cloud not only has a higher threshold and strong demand but also has a tight supply of high - quality products, and customers have a high acceptance of costs. Self - developed chips and full - stack capabilities have the opportunity to further optimize costs. In the long run, the market size for paying for AI intelligent agents and applications themselves may be larger than the market for simply charging by Tokens.
Looking at the global market, a similar trend is also happening. Take Google for example. In the first quarter of this year, Google Cloud's revenue increased by 63% year - on - year, and the operating profit margin reached 32.9%, a year - on - year increase of 15.7%. Among them, the volume of unfulfilled orders almost doubled quarter - on - quarter, and the data is truly remarkable. The "AI cloud" businesses of leading companies such as Amazon, Microsoft, and Oracle have also increased to varying degrees.
AI cloud is a good business, but it is not an easy one.
On the other side of the coin, the competition in AI cloud is evolving into a systematic engineering competition involving chips, models, inference platforms, data engineering, and application delivery. Just selling computing power makes it easy to fall into a price war; just selling models makes it easy to be replaced by cheaper APIs; just selling applications will be restricted by the underlying costs and capacity boundaries.
Those who truly have the opportunity to build barriers are those who can give full play to the efficiency advantages of full - stack AI and transform AI into business results that customers are willing to pay for continuously.
To sustain a good business, it tests the patience of strategy
But few people notice that behind these eye - catching figures are more difficult strategic choices.
Especially when GPU resources are in short supply, cloud providers must answer a real - world question: Where should the precious computing power be prioritized?
People close to Baidu Smart Cloud told "New Eye" that in a recent internal management seminar, this question was also repeatedly discussed as a core topic.
This is not a simple business problem but a judgment on the future AI business model.
If too many resources are invested in MaaS, the Token data will surely look good in the short term. However, the Token business is essentially a traffic business without a moat. Once the price war continues to intensify, no one will make money in the end. On the contrary, if key resources are invested in self - developed models and intelligent agent infrastructure, it will form a higher - quality commercialization ability in the long run, but the sales team will face great pressure in the short term.
This easily makes me think of Microsoft and Google. In the AI era, their core management teams have all gone through similar difficult choices.
Baidu Smart Cloud's current path is closer to the latter: it does not invest all GPUs in short - term Token revenue but gives priority to ensuring the construction of self - developed models and long - term technology infrastructure, and then allocates resources comprehensively based on ROI according to the project situation. At the same time, it continues to strengthen the collaborative optimization of software and hardware, improve the performance of the training and inference system, and exchange long - term system capabilities for higher - quality commercialization in the future.
This is a more difficult and slower path. According to insiders, Shen Dou previously shared a fragment of "On Protracted War" with all management and employees. According to Shen Dou, when doing ToB business, one should be a tree rather than grass: as long as it is a tree, it will grow year after year; if it is grass, no matter how lush it is, it will wither in autumn, and one will have to start all over again the next year.
The competition in the AI industry in the past two years has often been packaged as a 100 - meter race. Whoever releases models quickly, reduces prices more, and has a higher API call volume seems to be the winner.
But the AI cloud is a marathon. Short - term battle reports are important, but more important is whether core resources can be invested in long - term value, whether real full - stack capabilities can be established, whether high - value implementation scenarios can be continuously obtained, and whether costs can be continuously reduced in these scenarios and each Token can be transformed into meaningful business results.
The Token war is over.
Next, it's the era of implementers.