Xiaomi has joined the price war initiated by DeepSeek, slashing prices by 99% across the board to directly compete.
DeepSeek has just announced a permanent price cut for its API, and Xiaomi has followed suit.
According to Zhidx, on May 27th, today, Xiaomi officially announced a permanent price reduction for the MiMo-V2.5 series API and simultaneously upgraded the new Token Plan billing system. Compared with the original pricing, the new price has a maximum reduction of up to 99%, and there is no longer a distinction based on the context window length.
▲Permanent price reduction for the MiMo-V2.5 series API (Source: Xiaomi)
▲Xiaomi MiMo's new Token Plan billing system: The pricing remains the same, but the Credits have been significantly increased (Source: Xiaomi)
Just a few days ago, DeepSeek announced that starting from June 1st, the current promotional price of DeepSeek-V4-Pro will directly become the official price and will not return to the original price. Previously, DeepSeek had reduced the price of the V4-Pro API to 25% off and further reduced the input cache hit price to 1/10 of the original price.
Within just one week, two domestic large model manufacturers have successively chosen to implement "permanent price cuts", and the domestic API price war has heated up again.
This time, Xiaomi MiMo has almost directly matched the current API price of DeepSeek. The updated price list shows that the input cache hit price of MiMo-V2.5 has dropped to 0.02 yuan per million tokens, the price for non-hit input is 1 yuan per million tokens, and the output price is 2 yuan per million tokens; for MiMo-V2.5-Pro, they are 0.025 yuan, 3 yuan, and 6 yuan respectively.
▲Comparison table of the API prices of DeepSeek and Xiaomi MiMo (Compiled by Zhidx)
It is worth noting that MiMo has also cancelled the previous pricing strategy based on the context length this time. Whether it is a 256K or 1M context window, the same price is uniformly applied.
Yesterday, we conducted an in-depth review and comparison of the subscription packages and API call billing of dozens of mainstream large model manufacturers at home and abroad.
Among them, DeepSeek chose to implement a "permanent price cut", Alibaba suspended the Lite low-price package in the Coding Plan, ByteDance removed the low-price Coding Plan, and Zhipu increased the API call pricing by 83% in the first quarter of 2026. According to incomplete statistics, at least 5 domestic large model manufacturers, including Xiaomi, ByteDance, Alibaba, Zhipu, and Tencent, have made significant adjustments to their package systems in the past six months. Some manufacturers have begun to reduce low-price packages and quotas, and the overall price has increased.
Interestingly, not long ago, Luo Fuli, the person in charge of Xiaomi's MiMo large model, publicly criticized the industry price war. On the other hand, Xiaomi MiMo once topped the global call volume list on Hermes with its "100 trillion Token free plan".
▲Partial screenshot of Luo Fuli's post on X (Source: X)
Now, Xiaomi MiMo has officially joined this long-term price war.
01.
Major overhaul of the Token Plan:
The package capacity has increased by 5-8 times
In addition to the permanent price cut for the API, the biggest change this time is actually the Token Plan package system.
Xiaomi said that the new billing rules have been reorganized. Without changing the price, the package Credits have been significantly increased, and the usage has generally reached 5-8 times the original level.
From the perspective of the domestic mainstream large model subscription packages we reviewed, after Xiaomi's adjustment this time, in the entry-level category, Xiaomi's Lite package is similar to the lowest tiers of manufacturers such as Kimi, ByteDance, and Jieyue Xingchen, but it is not the lowest in the market. Tencent's Hunyuan Hy still has a Lite tier at 28 yuan per month.
In the high-end category, Xiaomi's Max package is not the highest either. Currently, Alibaba's premium version costs 1398 yuan per month, ByteDance's Agent Plan Max tier is 950 yuan per month, and MiniMax's Ultra Express version is close to 750 yuan per month.
▲Comparison of the prices of domestic large model subscription packages (Compiled by Zhidx, statistics as of May 27, 2026)
Meanwhile, Xiaomi also announced the new conversion relationship between Credits and Tokens and publicly provided the approximate actual Token scale corresponding to different packages in high cache hit scenarios.
▲Conversion relationship between Xiaomi MiMo Credits and Tokens
According to the calculation of the scenario with a cache hit rate of over 95% provided by Xiaomi, if using MiMo-V2.5, the 39-yuan Lite package can theoretically reach over 500 million Tokens, the 99-yuan Standard tier can exceed 1.3 billion Tokens, the 329-yuan Pro tier can reach over 4.7 billion Tokens, and the 659-yuan Max tier can exceed 10 billion Tokens.
Even for the more expensive MiMo-V2.5-Pro, in high cache hit scenarios, the 39-yuan Lite tier can reach over 190 million Tokens, the 99-yuan package can exceed 500 million Tokens, the 329-yuan tier can reach about 1.8 billion Tokens, and the 659-yuan Max tier can be close to 4 billion Tokens.
Xiaomi specifically emphasized that in Agent and Code scenarios, since the cache hit rate is usually higher, the actual number of available Tokens will increase significantly.
Xiaomi's "One Quadrillion Token Creator Incentive Plan" has also attracted the attention of the developer community. Xiaomi disclosed that as of 4:08 p.m. on May 26th, all 100T Tokens have been distributed in advance, and the event has ended ahead of schedule. All Token Plan users whose packages are still within the validity period, regardless of the usage of their previous packages, will have their Credits reset at 0:00 on May 27th and automatically switch to the new billing rules.
Xiaomi will also provide additional benefits to historical paid users whose packages have expired.
02.
Why is Xiaomi daring to cut prices?
Continuous optimization of the inference system
Xiaomi also specifically explained the inference optimization plan behind this.
The Xiaomi team fully supports SWA (Sliding Window Attention) based on SGLang HiCache, reducing the data transfer volume of KV Cache between multi-level storage such as GPU memory, CPU memory, and SSD to about 1/7 of the pre-optimization level, and increasing the cacheable Token quantity to about 5 times. Meanwhile, Xiaomi has also optimized mechanisms such as the expert parallel scheme and the input length bucket strategy to further improve the cluster input throughput capacity, thereby reducing the unit Token cost.
Put simply, Xiaomi's core logic this time is actually similar to that of DeepSeek: on one hand, a more aggressive cache hit strategy, and on the other hand, higher inference throughput efficiency.
Behind the price war, the essence is still the competition of inference systems and infrastructure capabilities.
03.
After the model's capabilities have entered the forefront
Xiaomi MiMo plays the price card
On April 23rd this year, Xiaomi officially launched the public beta of the Xiaomi MiMo-V2.5 series large models, introducing multiple versions such as MiMo-V2.5, V2.5-Pro, V2.5-TTS Series, and V2.5-ASR. Among them, MiMo-V2.5-Pro is mainly targeted at scenarios such as Agent, complex software engineering, and long-term tasks.
Currently, MiMo-V2.5-Pro ranks first among global open-source models in the Artificial Analysis comprehensive intelligence list and has entered the top five of the global large model list; its Agent index also ranks first among global open-source models.
On April 28th, Xiaomi further open-sourced the MiMo-V2.5 series under the MIT license, supporting commercial deployment and secondary training without additional authorization.
What really made MiMo quickly stand out in the developer community was the previous "100 trillion Token free plan".
On May 9th, Hermes Agent ("Hermes") surpassed OpenClaw ("Lobster") for the first time and topped the global application call volume list on OpenRouter. At that time, among the top five models in Hermes' monthly call volume ranking, MiMo-V2-Pro ranked first, followed by MiniMax M2.7, NVIDIA Nemotron 3 Super, Jieyue Xingchen Step 3.5 Flash, and Tencent Hy3 preview.
However, the rankings change very quickly.
As of 9 a.m. on May 27th, in OpenRouter's weekly call list, MiMo-V2.5-Pro ranked 14th;
▲OpenRouter's weekly call list (as of 9 a.m. on May 27th)
In Hermes Agent's monthly model call list, MiMo-V2-Pro has dropped from the previous top position to 16th.
▲(Source: OpenRouter, as of 9 a.m. on May 27th)
The current top three are DeepSeek-V4-Flash, the anonymous model Owl Alpha, and DeepSeek-V4-Pro.
▲(Source: OpenRouter, as of 9 a.m. on May 27th)
To some extent, this also shows the intensity of the current domestic large model competition: on one hand, the rankings and call volumes are changing more and more quickly; on the other hand, the price war is starting to directly approach DeepSeek's "rock-bottom price".
04.
Conclusion: After a double decline in revenue and profit
Xiaomi still entered the price war
Beyond the price war, Xiaomi's financial report yesterday also added a sense of "big sacrifice" to this round of price cuts. In the first quarter, Xiaomi's revenue was 99.1 billion yuan, a year-on-year decrease of 10.9%; the adjusted net profit was 6.1 billion yuan, a year-on-year decrease of 43.1%.
On the other hand, Xiaomi's investment in AI has not slowed down. The financial report shows that its R & D expenditure in that quarter reached 9 billion yuan, a year-on-year increase of 33.4%; as of March 31, 2026, the number of Xiaomi's R & D personnel reached 26,048, setting a new historical high.
Xiaomi's management also frequently mentioned AI during the earnings conference.