HomeArticle

Tokens aren't everything: The competition in the AI cloud is just beginning.

硅星人Pro2026-01-07 17:32
A sober reflection on the Token craze.

The cloud market hasn't been this bustling for a long time.

The latest data from IDC shows that in the first half of 2025, the scale of China's public cloud market reached 120.669 billion yuan, a year-on-year increase of nearly 20%. Among them, the scale of the MaaS market was 1.29 billion yuan, a year-on-year increase of 421.2%, becoming the fastest-growing segment in the cloud computing track; the scale of the AI large model solution market was 3.07 billion yuan, a year-on-year increase of 122.1%, and the finance, government affairs, and manufacturing industries contributed more than 60% of the budget. There is no doubt that AI has brought a second development opportunity to cloud computing.

As lively as it is, there are still some things to be calm about in the hustle and bustle.

At the end of the year, various companies have successively shown their report cards, and the Token call volume has become the most popular indicator. The daily average figures of dozens of trillions are popping up in the industry, giving a feeling similar to the market share war back then. But those familiar with the cloud computing industry should remember that the industry has not emphasized market share for a long time. It's not because it's unimportant, but because people have gradually realized that in a market that is still expanding rapidly, market share is just a snapshot of a certain aspect and can hardly explain much.

To some extent, today's Token call volume is repeating this logic.

It's clearer when looking at the figures together: the MaaS market is 1.29 billion, and the entire public cloud market is 120.6 billion, accounting for just over 1%. When all attention is focused on the growth rate and ranking of this 1%, we may be using too narrow a ruler to measure this market.

The Token call volume can explain some problems, but not all. The competition in cloud computing has never only occurred at the application layer. Therefore, the judgment of Silicon Star People is that in 2026, the competition in the AI cloud will ultimately return to the infrastructure and the competition of full-stack capabilities.

01 Token is a good indicator, but not the only one

There is a reason why the Token call volume has become a popular indicator. It is intuitive, quantifiable, and easy to spread. In a market that needs a shot in the arm, figures like "dozens of trillions per day" are naturally impactful. For investors, the media, and even some customers, this is the most understandable evidence of the prosperity of AI.

But the problems are also obvious. First of all, quantity does not equal quality. For the same task, if one model uses 10,000 Tokens to complete it and another uses 1,000, which one has stronger capabilities? The scene differences are also huge. The Tokens consumed for writing a marketing copy, for helping a financial institution with bill recognition, for chatting with customers on the consumer side, and for improving efficiency on the production side all have completely different meanings.

More importantly, API calls are just one way for enterprises to use AI, and it's the lightest way.

From the perspective of cloud service providers, there are generally several ways for enterprises to use AI: directly calling APIs, with the model on the cloud and paying by usage, which is the main caliber for current Token statistics; going deeper, enterprises can do post-training and fine-tuning on the cloud, pouring their own data in to train models suitable for their businesses; going even deeper, enterprises can download open-source models and deploy them locally or on private clouds, with data staying within the domain and inference completed locally; there are also more special scenarios, such as intelligent driving on vehicles and inference on the robot side, where the model runs directly on the device and the response must be in milliseconds, making it impossible to wait for a return from the cloud.

The computing power consumption generated by these paths is also huge, but it has never been included in the statistical caliber of any public report.

According to Silicon Star People, on Alibaba Cloud, the overlap between customers using API call services and those using GPU computing power is as high as 70%. This means that enterprises that truly use AI deeply never choose only one way but choose different service levels according to the scenario: some only need to call APIs, while others need to train their own models. Looking only at the API call caliber is like only seeing the part of the iceberg above the water.

Under the iceberg is the real picture of enterprises' transformation to AI: it's not simply connecting to an API, but a systematic transformation involving data, processes, and organizations.

This can't be rushed and can't be achieved overnight. No enterprise will think "you have the highest Token call volume, so I'll choose you" when choosing an AI service. What they think is: can you solve my problems.

02 The customers of the AI cloud are far more than just Internet companies

What kind of scenarios are consuming Tokens? The answer to this question determines how we understand the real state of the current AI cloud market.

Currently, in this wave of AI enthusiasm in China, the most obvious prosperity is concentrated on the consumer side: ChatBot, singing, dancing, AI face swapping, virtual companionship... These applications mainly run on mobile phones, with rapid user growth, high Token consumption, and good-looking data. At the same time, the fastest-growing customer group in the MaaS market is AI-native enterprises and Internet companies. They are naturally ideal users of API calls: their businesses are online, their data is ready, they have strong development capabilities, and they can run just by calling APIs.

But this is just one aspect of the AI market.

There is a broader space in the enterprise-level market and on various terminals. In 2025, a large number of traditional industries began to experiment with AI: the agriculture and animal husbandry industry is using AI for livestock quantity recognition and abnormal behavior detection; the security field is doing multi-modal home monitoring, baby care, pet recognition, and fire alarms; heavy industry is using maintenance assistants to shorten the training cycle of senior technicians; education companies are doing intelligent grading, not only for multiple-choice questions but also for subjective questions; logistics companies are equipping front-line employees with AI assistants to handle daily consultations. These scenarios don't just happen on mobile phones but also on vehicles, robots, industrial equipment, and IoT terminals, with much higher requirements for real-time performance, reliability, and data security than on the consumer side.

These enterprises have a common feature: they are not AI-native. They have decades of accumulated business data and industry know-how. What they need is not simply calling an API but deeply integrating AI with their own data and processes. Many enterprises haven't even completed digitalization. They need to do data governance first, then post-training and fine-tuning, and finally deploy applications. This is a whole set of services that can't be solved by an API interface.

This is also why the driving effects of open-source models and closed-source models on the cloud market show different rhythms. The path of closed-source models is more direct: customers call APIs and pay by usage, with clear revenue recognition and a beautiful growth curve. The logic of open-source models is different. After customers download them, they may deploy them locally, use their own GPU clusters for inference, or do post-training on the cloud without going through API calls. These usage behaviors are also happening, but they are difficult to count. The driving effect of open source on the cloud exists, but it is more dispersed, more hidden, and has a longer cycle.

Globally, Alibaba Cloud is a relatively special case. It is one of the few large cloud service providers that simultaneously bet on cloud computing infrastructure and the open-source model ecosystem.

After the Qwen series of models were open-sourced, the global download volume has exceeded 800 million times. But how much of these 800 million downloads has been converted into Alibaba Cloud's revenue? It's difficult to calculate directly. Open source is an ecological logic, not a transaction logic.

The excitement on the consumer side is just the beginning. The AI transformation of the enterprise-level market is the real tough battle: data governance, process reengineering, and organizational adaptation are all not easy steps. The transformation of the entire industry to AI still has a long way to go.

03 Return to the underlying logic of cloud computing

After so many years of development in the cloud computing industry, one of the deepest feelings is that there are no shortcuts in the cloud computing industry.

For every database product, the product when it was launched is almost completely reconstructed after being polished by dozens of customers. Every layer of service ability is built up through hard, dirty, and tiring work. The stability, security, and elastic expansion ability of infrastructure are not achieved by just telling stories but are polished through countless fault reviews, performance optimizations, and architecture iterations. There are no shortcuts to these abilities, and they can't be achieved overnight.

No matter how much imagination space AI brings to this industry, the underlying logic remains the same: the one with more solid infrastructure and more complete full-stack capabilities will go further.

The competition in MaaS has never been isolated. When an enterprise customer calls a large model API, what happens behind the scenes is much more complicated than "request - response". Behind it is the ability of the PaaS layer: how to store data, how to govern it, how to pour it into the model for training, and how to build an Agent workflow; further down is the accumulation of the IaaS layer: chips, servers, networks, storage, and GPU cluster scheduling, a whole set of infrastructure is supporting. If there is a shortcoming in any layer, the overall experience will be discounted.

This is also why full-stack capabilities have become even more important in the AI era. In the past, in the competition of cloud computing, IaaS, PaaS, and SaaS were relatively independent, and customers could purchase them layer by layer. They could use the computing power of Company A today and switch to the database of Company B tomorrow. But AI has changed this logic. Model training requires massive computing power, inference requires a low-latency network, and data needs to flow under the premise of security and compliance. These links are highly coupled and difficult to separate. The one who can best combine model capabilities and infrastructure capabilities to create the highest cost-performance ratio will have long-term competitiveness.

In 2026, when more enterprises move from "trying out" to "using deeply" and when AI moves from the consumer side to the production side, the competition will definitely return to the full stack.

But it's still too early to draw conclusions on how this competition will evolve.

In 1996, the president of Motorola visited China and predicted that there would be about 1 million mobile phone users in China by 2000. As a result, the number of mobile phone users in China exceeded 100 million in 2000 and exceeded 1 billion a few years later. By that time, Motorola had fallen behind. Predictions during the period of technological change often underestimate the explosive power of the market and overestimate the short-term competitive landscape.

The MaaS market accounts for just over 1% of the entire cloud computing market, and the AI penetration in China's enterprise-level market has just begun. 99% of enterprises have not really entered the market.

The development of cloud computing has always been a long process, and there are no shortcuts. Temporary data fluctuations are not worth getting overly excited or overly anxious about. What's really worth paying attention to is who is solidly building infrastructure, who is seriously serving enterprise customers, and who is preparing for the competition in three or five years.

This article is from the WeChat official account "Silicon Star People Pro", author: Yoky, published by 36Kr with authorization.