HomeArticle

The computing power is in urgent need, and four major forces are pouring into the Token service market.

数智前线2026-05-27 10:51
What business opportunities did they target?

After Tokens became the "hard currency", four forces are rapidly deploying in the market.

“Now you can sell as many Tokens as you have. The entire market is in short supply,” Xin Zhou, the General Manager of Baidu Smart Cloud's AI and Large Model Platform, told Digital Intelligence Frontline. Previously, people sold Tokens at a discount, but now you can't even buy them at a higher price. The demand for inference has become extremely strong. Mao Yunhang, the co - founder of Shike Technology, an AI Infra enterprise, observed that the entire market is shifting from a buyer's market to a seller's market. “Previously, the price of APIs was decreasing, and it was unprofitable. Now, a certain consumption volume of Tokens must be guaranteed to get a good price and supply.” Liu Weiguang, a senior vice - president of Alibaba Cloud, provided a figure: in the past five months, the call volume of Alibaba Cloud's Tokens has increased by 15 times.

After Tokens became scarce, four forces—major cloud providers, model companies, operators, and AI Infra enterprises—have rapidly deployed in the Token service market.

01

Tokens: From Being Sold at a Discount to Being Unavailable Even at a Higher Price

Although the sales of Tokens are booming, cloud providers have not taken the opportunity to sell them freely. Instead, they carefully allocate their GPU resources to balance internal model training and external Token sales. “People have realized that training better models can sell more Tokens,” said a person from a major company. Different internal departments are competing for GPU cards, and they finally compete through cost - accounting. Moreover, last year, Tokens were sold at a discount, and selling hardware devices directly was more profitable. Now, the situation has reversed, and everyone is reducing pure hardware sales. “Selling hardware is not as profitable as selling Tokens.”

Why have Tokens become so popular “overnight”? The reason is the explosion of real demand.

Liu Weiguang said that Coding has become a huge watershed. It not only generates new applications but also will unlock a large number of legacy systems in the next year—those legacy applications that are “too old to be moved to the cloud” can be revitalized with the help of AI Coding. More importantly, non - programmers have also started “programming”. Everyone can create their own reports, conduct analyses, and make project budgets, which has released productivity.

The popularization of agents has magnified the consumption of Tokens at the technical level. Mao Yunhang described it as “the Tokens are gone before you even do anything”. After agents have “hands and feet”, every step of completing a task consumes Tokens, and the consumption has increased sharply.

In the past two years, each major company has had a sales assessment for Tokens. ByteDance looks at the total amount of Tokens, while Alibaba, Baidu, and Tencent look at the number of model calls. However, it is difficult to implement. Xin Zhou explained, “There isn't that much real demand in the market. Many usages are inappropriate or overkill. For example, using large models for data cleaning or tasks that small models can handle. We call this low - quality calling.” With the improvement of agent technology, models, and Coding capabilities, some truly valuable applications have emerged, and these applications are also heavy consumers of Tokens.

Therefore, each major company has set a significant target for Tokens this year. “This target is based on the judgment of the real market demand.”

Facing this explosion of demand, Zheng Weimin, an academician of the Chinese Academy of Engineering, observed an industry shift: from MaaS (Model as a Service) to TaaS (Token as a Service). Although many enterprises do not specifically distinguish between MaaS and TaaS, their focus has begun to revolve around Tokens.

A Token is the smallest measurement unit for large models to process information. Approximately 1,000 Tokens correspond to 700 to 800 Chinese characters. Zheng Weimin explained that Tokens now serve as three types of measurement standards: it is the basic unit for large models to process information, a measurement mapping of different computing power consumption during AI operation, and is also becoming the standard unit for industry pricing and billing.

Previously, MaaS solved the “availability of models”, and the billing method was relatively rough, such as settling by the number of calls. TaaS, on the other hand, packages AI computing power into a standardized service like water, electricity, and data traffic, and refines the billing granularity to the smallest unit of Tokens.

Zheng Weimin explained the deep - seated contradiction behind this evolution: currently, AI infrastructure is mainly designed for large - model training, and the industry is caught in the dilemma of “expensive computing power infrastructure, weak inference engineering, and weak Token output”. His judgment is that the competition in AI infrastructure has shifted from comparing the scale of computing power clusters to comparing the production efficiency of Tokens per watt.

The competition for the Token market has also started rapidly. Liu Weiguang of Alibaba estimated that for AI - native startups, the proportion of Token expenditure is close to 100%; for domestic Internet companies, it is between 15% and 20%; and for traditional enterprises, it is currently below 5%. Alibaba Cloud requires that the Token expenditure of customers should be at least 20% of their total expenditure this year, and has specifically set up a sales position dedicated to MaaS. In AI - native startups and OPCs (Personal Companies), MaaS sales are the main focus. Liu Weiguang also revealed three key points: first, mobilize the entire staff, with thousands of salespeople deployed across the country to increase coverage and encourage customers to start using, even if it's just the most basic Coding transformation; second, open the model strategy, and all models deployed on Alibaba Cloud will be treated as first - party models; third, reconstruct the assessment indicators, focusing on three things: the daily growth of paying Token customers, the number and efficiency of customer core systems accessing Tokens, and the efficiency of agents completing closed - loops autonomously within the enterprise.

Taking central and state - owned enterprises as an example, Xin Zhou estimated that their Token expenditure accounts for about 1% of their total IT expenditure, and there is huge room for future growth. Baidu requires that the effect of agents should be achieved first this year. After customers have value expectations, penetration and cost - reduction can be carried out.

In the context of tight computing power, domestic AI infrastructure has an opportunity. Mao Yunhang observed that domestic chips are starting to emerge, and there are gradually domestic chips that can support large - scale cluster supply. The domestic adaptation work of Shike Technology has also evolved from a small - scale or even “volunteer - based” activity to a real production - level demand. “If you can adapt a certain domestic chip now and deploy a new model on it to meet production - level requirements, you can basically activate all the inventory of this chip.”

Liu Weiguang made a more macroscopic prediction. When Tokens cover “everything”, the IT expenditure structure of the entire market will change fundamentally. Software outsourcing and traditional IT procurement will face industrial reshaping. Tokens are becoming the new water and electricity.

02

Actions of the Four Major Schools

After Tokens became the "hard currency", four forces have rapidly deployed: major cloud providers, model companies, operators, and AI Infra enterprises.

Major cloud providers were the first to propose Token services. Their core advantage lies in their full - stack capabilities. They have models, computing power infrastructure, and almost all have chips. Baidu proposed “chip, cloud, model, agent” at this year's Developer Conference, while Alibaba Cloud proposed “chip - cloud - model - inference” at its annual summit. Liu Weiguang of Alibaba Cloud told Digital Intelligence Frontline last year that the “winning factor” for major cloud providers is cost - effectiveness, and full - stack technology is the core path to achieving extreme cost - effectiveness. This year, he especially emphasized the deep integration of chips and models. “Behind each model training, there is strong computing power support. The two are interlocked and develop in a spiral. Therefore, we must take our own path and emphasize the integration of cloud, chip, and model.”

On the product side, cloud providers are moving from cloud - native and AI - native to “Agent - native”. The entire cloud technology stack and service system almost need to be redesigned for agent applications. Currently, enterprises are systematically transforming their cloud product lines—Skill - based, MCP - based, and CLI - based. At the same time, cloud providers not only promote Token sales but also attach great importance to packaging Tokens into agent applications, such as Coding, various agents, and tools. Whether it's ToC or ToB, they first complete a closed - loop from Token production to application.

The second force is model companies. Such companies include Zhipu, Minimax, Kimi, etc. However, they are more focused on the models themselves, which is very different from major cloud providers.

They provide API and Token services and also entrust other industrial chain parties to sell model call services. Although some model enterprises have been listed on the Hong Kong Stock Exchange with a market value of hundreds of billions, according to the observations of multiple parties in the industrial chain such as data center construction parties, the actual revenue and cash - flow scale of these enterprises are not large enough. Therefore, they generally choose to maintain a light - operation state and currently have limited self - owned computing power infrastructure. The focus of these enterprises is on the models themselves. “Selling the developed models” is their core goal, and Tokenization is just a means. For example, the Token packages recently launched by China Telecom Cloud for developers and small and medium - sized enterprises have integrated models such as Zhipu GLM5.

The third force is operators. In May, the three major operators collectively launched Token package services, and China Telecom was the fastest. In fact, as early as the Digital China Summit in April, Liu Guiqing, the general manager of China Telecom, publicly stated that “the traditional industrial division of labor and value distribution model is being reshaped by a new business model centered on Tokens”, disclosed the strategic plan related to Tokens, and China Telecom Cloud also began to build a full - stack Token service system from IaaS to SaaS. Subsequently, China Telecom launched a trial - commercial Token package in May.

The core advantage of operators is that they not only have a large number of data centers, computing power, and network resources but also have a platform for reaching customers at the last mile and the ability to provide local services across the country. After AI is Tokenized, it is logically similar to phone bills and data traffic and can be billed and operated like water and electricity. Operators cooperate with the ecosystem to jointly develop AI applications and promote the popularization of AI through Tokenized services.

What's more noteworthy is that operators are the first major force in China to purchase domestic chips on a large scale and have a strong motivation to promote the adaptation of the domestic chip ecosystem. Currently, the industry is facing challenges such as low computing power utilization, fragmented heterogeneous computing power, high difficulty in domestic adaptation, and rapid model iteration. The industry has noticed that it may take several months for domestic chips to adapt to new models and meet production - level requirements. During this period, model companies are constantly launching new models, and the overall adaptation speed is far from sufficient. Therefore, operators also mobilize all parties through their ecosystem integration capabilities to conduct multi - chip adaptation and multi - model integration and are the key promoters of the domestic ecosystem.

The fourth force is AI Infra enterprises, which are currently the most popular in terms of financing. The explosion of agent applications has increased Token consumption and is reshaping the business logic of these enterprises. Previously, “making a profit from the price difference was not cost - effective”. Now, as the industry shifts from a buyer's market to a seller's market, the commercialization path of this track has become clearer.

Among these enterprises, Shike Technology aims to replicate the success of the US Corewave company in building an independent third - party domestic GPU cloud ecosystem in China. It focuses on large - scale cluster operation and domestic chip adaptation, achieved profitability three years ago, and is evolving into a heavy - asset independent third - party cloud platform. Silicon - based Flow entered the industry's vision last year by cooperating with Huawei Cloud and deploying the DeepSeek model the fastest. It mainly focuses on the MaaS layer and is closer to the user side. Wuwen Xinqiong was the first to propose the “MxN” concept in the industry, positioning itself as an intermediate - layer product between M types of models and N types of chips.

The industry has observed that the US AI Infra enterprise Corewave has limited profit margins due to the dual squeeze from leading model companies and NVIDIA. However, Mao Yunhang told Digital Intelligence Frontline that domestic AI Infra enterprises are facing an important opportunity in domestic adaptation. The domestic market has an urgent need for domestic chip adaptation. Each chip has a different architecture and different adaptation difficulties. It is far from enough for hardware manufacturers alone to solve these problems in terms of time and resources. Chip manufacturers, AI Infra enterprises, and application parties need to work together to complete the entire chain. “One is domestic adaptation, and the other is optimization. These are the opportunities we have found in this wave of development,” said Mao Yunhang.

03

Coding and Agents: The Most Reliable “Money - Printing Machines”

Among the many directions of Token services, the large language models in the Coding and Agent directions yield the greatest returns. A person in the industry told Digital Intelligence Frontline that although the Coding Plans (Coding subscription packages) launched by major companies seem to have a low price, they are actually profitable. The reason is that under the monthly - pricing model, the actual consumption of most users is far below the upper limit. “On average, the Coding Plan is more profitable than simply selling Tokens.”

A senior person further added to Digital Intelligence Frontline that currently, the commercial value of video generation is far lower than that of large language models. Xin Zhou's judgment is more straightforward: once large language models truly enter the production environment, they can generate huge returns, and “there is no upper limit to the revenue”.

Liu Weiguang further analyzed this. He believes that although there is indeed a huge market space in advertising, media, film and television, and short - videos, compared with the large language models in the Coding and Agent directions, they are not in the same league. His analysis logic is that Coding is not just programming. Coding has given rise to agents, which can independently complete tasks and help humans improve productivity, and all of this is deeply related to large language models. “The biggest focus of all our efforts now is the large language models in the Coding and Agent directions. The market for this model will be much larger than that of other models.”

Liu Weiguang observed that since the emergence of Coding tools, the development speed of applications has significantly accelerated. He predicted that once “everyone can code” becomes a reality, the number of applications or agents generated each year will be several times that of the past. This is not only a leap in productivity but also a structural reshaping of the entire software industry.

AI Infra enterprises have also noticed the situation in this track. Mao Yunhang of Shike Technology said that nowadays, almost no programmers do not use AI. Large companies at home and abroad are using models for Coding, and the entire industry has been quietly changed. The rise of agents has further magnified this effect. “How to ensure stable code output, fully utilize the cache, turn code into a complete project, and how to enable agents to produce efficiently within a controllable range—these are the most discussed engineering directions in the industry at present.”

Regarding the next growth trend of Tokens, the industry has different views. Most people believe that the computing power supply in the industry will be very tight in 2026 and will become even tighter in the next two years. Some people believe that the current shortage of Tokens is related to the chip supply at home and abroad, but the long - term situation still needs to be observed.

However, there is a consensus that under the constraint of limited computing power resources, how to maximize the production efficiency of unit Tokens has become a core proposition for releasing AI productivity. “I've noticed that language models are one - dimensional, while driving is a two - dimensional plane, and when it comes to low - altitude, embodied, and