In the fierce Token battle, Huawei Cloud has chosen a third path | Frontline
Author | Deng Yongyi
Editor | Zhang Yuxin
"In the context of limited domestic computing power supply, Huawei Cloud doesn't really care about the total amount of Tokens or the total revenue. What it cares about is the health of the Tokens produced by the domestic computing power system, which should represent an improvement in productivity, not just emotional value."
On June 5th, 2026, the Huawei Cloud INSPIRE Creator Conference opened in Shanghai. Zhou Yuefeng, the CEO of Huawei Cloud, said.
He gave an example: When a person idly asks a question to an AI on their phone, Tokens will be generated, but it's hard to tell how much value these Tokens have. In his view, when evaluating the performance of a cloud service, one shouldn't emphasize the number of trillions of Tokens it generates, but rather how much efficiency these Tokens bring to enterprises.
Zhou Yuefeng, CEO of Huawei Cloud
In the past two years, Chinese cloud providers have engaged in a long - lasting price war around Tokens. In May 2024, after DeepSeek V2 fired the first shot in price cuts, Volcengine's Doubao ignited the war with a pricing of 0.0008 yuan per thousand Tokens. Then Alibaba, Baidu, Tencent, and iFlytek entered the fray one after another, marking the first battle among model providers.
The core of this strategy is to use low - cost models to attract users and drive the sales of the underlying public cloud. The cost was that the gross profit margin of inference computing power was once pushed into the negative. It wasn't until DeepSeek R1 opened up the paradigm of the inference era that the Coding and video models reignited the real Token war.
At this press conference, Huawei Cloud didn't play the price - cut card. Instead, it proposed a new paradigm called "Agentic Infra", targeting domestic computing power. While everyone else is competing to see who can offer cheaper Tokens and who can have a higher call volume, Huawei Cloud has chosen a third path in the Token economy - not competing on unit price and call volume, but betting on the autonomy and controllability of domestic computing power and whether it can help enterprises improve real productivity.
Don't Engage in Price Wars, Build the Token Factory on Domestic Hardware
To achieve the construction goals of Agentic Infra, Huawei Cloud has rolled out a complete set of underlying infrastructure.
Agentic Infra includes four aspects: an efficient Token factory + continuous learning + integrated scheduling of general and intelligent computing + secure and autonomous operation, and four new products were released accordingly.
The most core component is the AICS Lingqu Intelligent Computing Cluster. Based on the Lingqu network, it supports a cluster scale of up to 100,000 cards, with a total computing power of 200 EFLOPS. It can reduce the Token generation latency to less than 10 milliseconds, achieve a throughput of 5 million Tokens per second per thousand cards, and has an online service availability of 99.95% - Huawei Cloud calls it the "Token factory".
The supporting CCE Volcano Next scheduling engine, through the form of "shared pool for training and inference + fragmentation integration", schedules general computing power and intelligent computing power in a mixed manner, increasing resource utilization by more than 30%. The AMS memory storage solution creates a PB - level memory space using NPU - direct hardware, and AgentSphere provides a secure operating environment for agents with a startup time of 100 milliseconds.
At the model level, Huawei Cloud simultaneously released the new - generation training and inference platform, ModelArts Next. Among other features, the MaaS model router can automatically schedule the most suitable model according to the request characteristics. Currently, it has connected more than 15 state - of - the - art models, and the official claims that the scheduling accuracy rate exceeds 95% and the call cost is reduced by an average of 20%.
Huawei Cloud has also launched a series of functions that meet the urgent needs of customers in its advantageous enterprise scenarios. For example, ModelArts Next encapsulates reinforcement learning into an enterprise - level RLaaS service and provides confidential inference capabilities, ensuring that data in high - sensitivity scenarios such as finance and coding "only goes in and doesn't come out".
The prerequisite for taking this third path through this infrastructure is the Ascend ecosystem. At the beginning of this year, when DeepSeek became extremely popular, Huawei Cloud and Silicon - based Flow deployed DeepSeek - R1/V3 on the Ascend CloudMatrix 384 super - node, and the inference efficiency at that time could match that of NVIDIA H800. This means that domestic computing power can already offer usable performance in the inference of mainstream large models.
From the Computing Power Base to Industry Implementation, Betting on the "Most Open Cloud"
If the base solves the problem of "where the computing power comes from", Huawei Cloud wants to focus more on "where the Tokens go" - how they can be applied to the productivity of specific industries.
Going one step further, Huawei Cloud has launched a public beta of the Zhiguo AgentArts enterprise - level agent platform, and at the same time, the open - source version, openJiuwen, has been launched. The core of the open - source version has more than 90% homology with the enterprise version.
Regarding the differentiating factors of Huawei Cloud in the AI era, Zhou Yuefeng repeatedly emphasized that Huawei Cloud has an ecosystem with open Ascend and Kunpeng computing power, an open - source Euler operating system, and an open - source ModelArts toolchain. It is the "most open cloud in the agent era".
This openness also extends to the model ecosystem. At the conference, Huawei Cloud, in collaboration with more than 20 model providers such as Zhipu, DeepSeek, Kimi, Jieyue Xingchen, and Baidu, jointly launched the "Hundreds of Models, Thousands of Forms" cooperation plan.
What Zhou Yuefeng truly regards as the "productivity landing point" is the industry scenarios.
Currently, Huawei Cloud has helped enterprises in many industries and scenarios to utilize AI. For example, the CloudRobo development platform launched by Huawei Cloud for the embodied intelligence industry enables small and medium - sized enterprises to access and share data and models at a low cost.
"There are more than 300 embodied intelligence startups in China, and their scales are not large. It would be too much of a burden for them to build their own computing power and data chains independently," Zhou Yuefeng explained.
The medical industry is one of the key industries that Huawei Cloud has invested in. Previously, Huawei established a medical business group in 2025. After the arrival of the large - model era, Huawei Cloud now has a typical case.
Currently, there are only about 20,000 doctors in China who can examine pathological slices, resulting in a huge shortage. The misdiagnosis rate of pathological examinations in remote hospitals is relatively high.
In response to this problem, the pathological large model jointly developed by Huawei Cloud and Ruijin Hospital has been launched. This model enables county - level and prefecture - level hospitals to access the diagnostic capabilities of top - tier hospitals through the cloud, eliminating the need for patients to travel long distances.
In addition, considering the concerns of governments, financial institutions, and central state - owned enterprises about data security and localization, Huawei Cloud has also simultaneously released a hybrid - cloud white paper for agents and a confidential computing solution, taking a "two - pronged approach" with public cloud and private cloud.
"Huawei Cloud aims to be the silicon - based fertile ground." Zhou Yuefeng repeatedly mentioned that Huawei Cloud wants to build a "second computing power plane". This means not comparing the scale of domestic computing power with that of "global - sourced" computing power like NVIDIA, but providing global developers with an additional technical route and an ecosystem option.