In February, there was a surge in AI usage. China's AI call volume exceeded that of the United States for the first time. Four large models dominated the top five positions globally. The demand for domestic computing power is experiencing exponential growth.
In February, the model call volume of Chinese AI witnessed an explosive growth, surpassing that of the United States for the first time.
Data from OpenRouter, the world's largest AI model API aggregation platform, shows that during the week from the 9th to the 15th, Chinese models had a call volume of 4.12 trillion Tokens, exceeding the 2.94 trillion Tokens of US models during the same period for the first time.
During the week from the 16th to the 22nd, the weekly call volume of Chinese models further soared to 5.16 trillion Tokens, a 127% increase in three weeks, while the call volume of US models during the same period dropped to 2.7 trillion Tokens. Meanwhile, among the top five models in global call volume, four are Chinese models. This strong growth momentum does not rely on a single blockbuster product but on the cluster - style rise of Chinese AI manufacturers.
Token is the smallest unit for an AI model to process text. Compared with the number of users, Token call volume is a more crucial indicator that can truly reflect the usage intensity, user stickiness, and commercial value of an AI model.
Chinese model manufacturers are seizing the global market with rapid iteration and cost advantages, and the demand for domestic computing power is experiencing exponential growth.
List reshuffle: China's Token call volume surpasses the US for the first time, and four large models dominate the list
OpenRouter platform brings together hundreds of large - language models globally and has more than 5 million developer users. It is currently the world's largest AI model API aggregation platform. Therefore, its API call volume data is regarded as the most authentic "barometer" to insight into the global AI application implementation trend, as it directly reflects the choices of developers "voting with their feet" and shows the popularity and competitiveness of models in practical applications.
Notably, the users of this platform are mainly overseas developers. Among them, US users account for as high as 47.17%, while Chinese developers only account for 6.01%. This makes its list data more objectively reflect the real appeal of Chinese AI models globally.
A reporter from National Business Daily (hereinafter referred to as the reporter) sorted out OpenRouter data and found that the Token call volume of global large models has experienced an astonishing explosive growth in the past year. During the week from March 3rd to 9th, 2025, the weekly call volume of the top ten models on the platform was only 1.24 trillion Tokens. By mid - February 2026, this figure had soared to 13.95 trillion Tokens, more than a ten - fold increase in less than a year.
In 2025, US models were the main driving force for market growth. Their weekly Token call volume once accounted for nearly 70% of the total of the platform's top ten models, while Chinese models accounted for less than 20% during the same period. However, in 2026, the growth rate of US models began to show signs of fatigue, while Chinese models started the "rapid growth" mode.
Data shows that in the first week of February 2026 (from the 2nd to the 8th), the weekly call volume of Chinese models had jumped to 2.27 trillion Tokens, sending a strong signal of pursuit.
Just one week later, during the week from February 9th to 15th, Chinese models officially surpassed US models with an astonishing call volume of 4.12 trillion Tokens, compared with 2.94 trillion Tokens of US models during the same period, achieving a historic overtaking.
This momentum did not stop there. By the week of February 16th, the weekly call volume of Chinese models soared to 5.16 trillion Tokens. The call volume increased by 127% in three weeks, further expanding the leading advantage.
This strong growth momentum does not rely on a single blockbuster product but on the cluster - style rise of Chinese AI manufacturers.
The weekly list from February 16th to 22nd, 2026, shows that among the top five models in platform call volume, four are from Chinese manufacturers, namely M2.5 of MiniMax, Kimi K2.5 of Dark Side of the Moon, GLM - 5 of Zhipu, and V3.2 of DeepSeek. These four models together contributed 85.7% of the total call volume of the Top 5.
Specifically, the M2.5 model released by MiniMax on February 13th, 2026, quickly topped the weekly call volume list in less than a week after its launch. During the week from February 9th to 15th, among the 3.21 trillion Tokens of the total call volume surge on the OpenRouter platform, the M2.5 model alone contributed an astonishing increment of 1.44 trillion Tokens.
The Kimi K2.5 model released by Dark Side of the Moon on January 27th has achieved continuous jumps in call volume due to its native multi - modal architecture and strong Agent parallel processing capabilities. This model can dispatch up to 100 "Agent avatars" to work in parallel, increasing the efficiency of complex task processing by 3 to 10 times. According to media reports, the cumulative revenue of Kimi in less than a month after the release of Kimi K2.5 has exceeded its total revenue in the whole year of 2025, and the growth is mainly driven by the significant increase in global paying users and API call volume.
Since the release of Zhipu's flagship model GLM - 5 on February 12th, with its ultra - long context window of 200K and in - depth optimization for long - range Agent tasks, the user scale has witnessed rapid growth. Its call volume increased to 0.8 trillion Tokens in the second week after its launch.
In the past year, although Alibaba Qwen was not frequently on the single - model list, a report jointly released by a16z and OpenRouter shows that the total Token call volume of its whole - series models ranks second globally with 5.59 trillion, second only to DeepSeek (14.37 trillion).
A report from consulting firm Frost & Sullivan shows that in China's large - model B - side market in the second half of 2025, the daily average Token call volume of the Qwen series models accounted for 32.1%, ranking first, almost doubling compared with 17.7% in the first half of the year, and the leading advantage over ByteDance Doubao (21.3%) and DeepSeek (18.4%) has expanded.
Regarding the pattern of Chinese large AI models, Hu Yanping, a specially - appointed professor at Shanghai University of Finance and Economics, put forward the concept of the "Chinese AI Group" in an interview with the reporter.
He believes that a higher industrial market concentration is not necessarily better. It is beneficial for competition, innovation, and talent ecosystem construction to have a wide - ranging technological and industrial community formed by multiple leading enterprises rather than a few oligarchs, and it is also conducive to forming a cluster advantage in the Sino - US AI competition.
Martin Casado, a partner at well - known venture capital firm Andreessen Horowitz (a16z), observed that among AI startups seeking financing in Silicon Valley today, up to 80% of their core models in roadshows use Chinese open - source models.
Competitiveness: The cost is less than 1/10 of that of US AI. Why are Chinese Tokens so cheap?
The reason why Chinese models can sweep global developers in a short time is that besides having performance comparable to or even surpassing that of international top models, their highly competitive cost is another indisputable core advantage.
Taking the prices announced by the OpenRouter platform as an example, the cost advantage of Chinese models is obvious at a glance.
In the process of models processing input information (Input), the prices of M2.5 of MiniMax and GLM - 5 of Zhipu are both $0.3 per million Tokens. In contrast, the price of the overseas mainstream comparable product Claude Opus4.6 is as high as $5 per million Tokens, about 16.7 times that of these two Chinese models.
In the process of models generating content (Output), the cost difference is even more significant. The output price of MiniMax M2.5 is $1.1 per million Tokens, and that of Zhipu GLM - 5 is $2.55 per million Tokens, while the price of Claude Opus4.6 soars to $25 per million Tokens, about 22.7 times and 9.8 times that of the former two respectively.
Such a huge cost gap directly determines the economic considerations of developers when choosing APIs.
This significant cost difference first stems from the architectural innovation at the algorithm level.
Li Qing, the China director of Frost & Sullivan, analyzed in an interview with the reporter that the technical route represented by the "Mixture - of - Experts (MoE)" architecture is one of the core reasons why Chinese models can significantly reduce the inference cost. Currently, models such as DeepSeek on the list and Alibaba's Tongyi Qianwen 3.5 - Plus have widely adopted the MoE architecture.
The ingenious part of the MoE architecture is that it splits a huge model into multiple relatively small "expert networks" and a "gating network". Although the total number of parameters of the model may be very large (such as having hundreds of billions of parameters), ensuring its "knowledge reserve" and upper - limit of capabilities, when actually processing a task, the gating network will intelligently judge the nature of the task and only activate (call) a small part of the most relevant expert networks to participate in the calculation.
This "on - demand activation" rather than "full mobilization" mode greatly reduces the amount of calculation and the demand for hardware resources compared with traditional dense models (which call all parameters in each calculation). Data shows that adopting the MoE architecture can directly reduce the video memory occupation during inference by 60% and increase the inference throughput (the number of Tokens processed per unit time) by up to 19 times. This cost reduction and efficiency improvement achieved from the technological source are the fundamental sources of its cost advantage.
In addition to the innovation of the algorithm architecture, Chinese AI manufacturers are also actively exploring the path of "vertical integration" to further compress the cost behind each Token. The core idea of this path is to conduct in - depth and integrated collaborative design and optimization of the upper - layer model algorithm, the middle - layer cloud computing infrastructure, and the bottom - layer AI chips, so as to solve the adaptation pain points between software and hardware and make the most of every bit of computing power.
Li Qing took Alibaba's "Tongyi - Cloud - Chip" system as an example to illustrate. This top - down vertical integration mode can achieve the most efficient use of bottom - layer hardware resources through extreme computing power scheduling algorithms, thus significantly reducing the infrastructure cost behind AI services. This system - level optimization further reduces the Token generation cost.
JPMorgan Chase made an extremely optimistic prediction about the Chinese market in its research report. It is estimated that from 2025 to 2030, the annual compound growth rate of China's Token consumption will reach an astonishing 330%, achieving a 370 - fold increase in just five years.
Qualitative change in value: Tokens are changing from Internet "traffic" to "fuel" in the AI era
Image source: AI - generated
The exponential increase in Token consumption seems on the surface to be due to the growth of user scale and usage time, but the deeper - level driving force behind it is the fundamental change in users' AI usage patterns. The role of AI is evolving from a "Q&A tool" that provides simple information and conducts daily chats to a "productivity tool" that can deeply participate in workflows and handle complex tasks.
In a recent research report, Guolian Minsheng Securities put forward the concept of "Token inflation". This does not mean that Tokens themselves become more expensive but refers to the structural increase in Token consumption per unit user in unit time. The report attributes this phenomenon to three core trends.
First, users' core needs are shifting from shallow "Q&A" to in - depth "work", that is, more and more using AI to refactor code, rewrite files, generate documents, and run tests. The programming scenario naturally has the characteristics of "long context, multi - round iteration, and large - scale output", which will consume a large number of Tokens.
Second, the rise and popularity of AI Agent technology have magnified Token consumption. Agents will actively plan, retrieve, execute, and reflect, calling the model multiple times, and Token consumption naturally accumulates step by step.
Finally, there is an increase in inference intensity. More in - depth thinking and longer - link inferences will significantly increase the Token consumption of output and intermediate processes. But for developers, this often leads to a higher success rate and less rework, and users are actually willing to "increase Token investment to exchange for efficiency".
This series of changes means that Tokens are not "traffic" with almost zero marginal cost in the traditional Internet era but "fuel" essential for performing production tasks.
This trend coincides with the judgment of global top chip manufacturers. Nvidia CEO Jensen Huang repeatedly emphasized a core view to the market during the earnings conference call on February 26th: "Computation is revenue" and "Inference is revenue". He pointed out that without computing power, Tokens cannot be generated; without Tokens, revenue growth cannot be achieved. In the AI era, inference performance directly determines customers' revenue - generating ability, and the core of inference is to efficiently generate commercializable Tokens. Today, as the power bottlene