Up to 99% Off: Xiaomi Permanently Cuts Large Model API Prices

This time, Xiaomi has abandoned the practice of distinguishing context window lengths in traditional pricing and optimized the Token Plan billing system.

On May 27th, Xiaomi Technology issued an announcement stating that the MiMo-V2.5 series of APIs will implement a permanent price cut, with a maximum reduction of up to 99%. This makes Xiaomi another large model company, following DeepSeek, to announce a permanent price cut for its APIs.

Xiaomi's price cut takes effect today. It not only abolishes the practice of differentiating the length of the context window in the traditional pricing but also optimizes the Token Plan billing system. Under the same payment price, the Token usage can be increased to 5 to 8 times the original amount, offering better cost - effectiveness.

Looking at the specific pricing and reduction details, after the price adjustment of MiMo-V2.5-Pro, the price for input cache hits is only 0.025 yuan per million tokens. Compared with the original price of 1.40 yuan for the ≤256k specification, the reduction is 98%, and compared with the 256k - 1M specification of 2.80 yuan, the reduction reaches 99%. The price for input without cache hit is 3.000 yuan per million tokens, a 57% decrease from the original price of 7.00 yuan and a 79% reduction compared with the long - window original price of 14.00 yuan. The output pricing is 6 yuan per million tokens, a 71% and 86% decrease compared with the original prices of 21 yuan and 42 yuan respectively.

The price cut for the standard version of MiMo-V2.5 is also significant. After the price adjustment for input cache hits, it is 0.020 yuan per million tokens, a 96% reduction compared with the original price of 0.56 yuan for ≤256k and a 98% reduction compared with the 256k - 1M original price of 1.12 yuan. The price for input without cache hit is 1.000 yuan per million tokens, a 64% decrease compared with the original price of 2.80 yuan and an 82% reduction compared with the long - window original price of 5.60 yuan. The output pricing is 2 yuan per million tokens, a 86% and 93% decline compared with the original prices of 14 yuan and 28 yuan respectively.

It is worth noting that this price adjustment mainly focuses on the core MiMo-V2.5 series. The MiMo-V2.5-TTS series still maintains the limited - time free access policy. The API prices of the two high - end models, MiMo-V2-Pro and MiMo-V2-Omni, remain unchanged at the original prices. At the same time, their Token Plan packages will no longer participate in the adjustment and will soon be taken offline, guiding developers to migrate to the more cost - effective V2.5 series.

The iteration of the MiMo-V2.5 series is led by Luo Fuli, a leading figure in Xiaomi's AI. In November 2025, Luo Fuli, a post - 95s AI technology talent who previously worked at DeepSeek, officially joined Xiaomi as the person in charge of the MiMo large model and formed a R & D team with an average age of 25 and more than 60% of graduates from Tsinghua and Peking Universities. There are reports that Lei Jun invited Luo Fuli to join Xiaomi with an annual salary of tens of millions.

Under Luo Fuli's overall planning, Xiaomi's MiMo large model has completed multiple generations of rapid iterations. In March this year, three basic models, MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS, were officially launched. Subsequently, the V2.5 advanced version was iterated and upgraded, filling in the full - scenario capabilities such as high - performance reasoning, lightweight general interaction, and voice synthesis, and becoming the main product in Xiaomi's large model array for the commercial inclusive market.

Currently, Xiaomi has built a relatively complete product matrix of the MiMo large model. Specifically, MiMo-V2.5-Pro focuses on high - performance complex reasoning and is suitable for high - end commercial scenarios such as enterprise - level intelligent agent development and in - depth business analysis. MiMo-V2.5 focuses on lightweight general needs, targeting the daily calls of small and medium - sized developers and the implementation of lightweight applications. MiMo-V2.5-TTS mainly targets the voice synthesis track, using a free strategy to seize the audio ecological entrance. In addition, as the flagship base model, MiMo-V2-Pro focuses on the performance breakthrough of the trillion - parameter MoE architecture, and MiMo-V2-Omni focuses on full - modality fusion.

Just yesterday, Lei Jun, the founder of Xiaomi, said, "Xiaomi MiMo-V2.5-Pro ranks first among global open - source models in the comprehensive intelligence index and Agent index of the Artificial Analysis list. Xiaomi plans to invest 60 billion yuan in the AI field in the next three years."

Before Xiaomi announced the price cut, another leading domestic large model company, DeepSeek, had already taken the lead in implementing a "permanent price cut", mainly targeting the DeepSeek-V4-Pro model. After the limited - time discount for the corresponding API ends on May 31st, the overall price is adjusted to one - quarter of the original price. After the price adjustment, the price for input cache hits is 0.025 yuan per million tokens, 3 yuan per million tokens for input without cache hit, and 6 yuan per million tokens for output, achieving a 75% reduction compared with the original price.

DeepSeek-V4 was released in late April this year. It has a super - long context of one million characters and leads in the domestic and open - source fields in terms of Agent capabilities, world knowledge, and reasoning performance. It mainly includes two models, DeepSeek-V4-Flash and DeepSeek-V4-Pro. The call cost of DeepSeek-V4-Pro is much lower than that of international mainstream models such as GPT - 4o and Claude, quickly seizing the developer and enterprise user groups and providing a market model for manufacturers such as Xiaomi to follow up with price cuts.

In addition to Xiaomi and DeepSeek, the domestic large model market also shows obvious K - shaped differentiation characteristics. General - purpose large models such as Alibaba Cloud Tongyi Qianwen and ByteDance Doubao have successively lowered their API call prices, while models such as Zhipu GLM and Tencent Hunyuan, which focus on enterprise - customized services, have maintained stable prices or even made small price increases, forming a new pattern of "price cuts for general models to increase volume and premium preservation for high - end models".

Behind this phenomenon is the transformation of the industry from disorderly price wars to competition in technical efficiency. Price cuts are no longer just a simple marketing means but an inevitable result driven by the optimization of underlying algorithms, the upgrade of reasoning technology, and the decline in computing power costs.

The 2026 AI API Infrastructure Report released by the AI aggregation platform AI.cc shows that in the past year, the Token call cost of enterprise - level large models has plummeted by 67% year - on - year, and open - source models have accounted for 38% of the enterprise Token call volume. Cost - effectiveness has become the core competitive factor in the market.

This article is from the WeChat official account "Jiemian News". The author is Song Jianan. It is published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Up to 99% off, Xiaomi's large model API prices are permanently reduced.