Xiaomi: Bitte nennen Sie mich Token-Preis-Butcher.
Those who bet on a sharp rise in the token price in 2026 were disappointed twice in just one week.
On May 22, DeepSeek announced a permanent price cut for DeepSeek V4 Pro. Overnight, Xiaomi also reduced the prices of the MiMo - V2.5 series, with a maximum reduction of 99%.
At the same time, Xiaomi optimized the settlement system of the token plan. The price remains the same, but the usable amount is increased to 5 - 8 times the original amount.
No big surprise: Discussions about the price cut of the Xiaomi MiMo model on foreign platforms like Reddit and X, as well as on various developer forums, quickly heated up.
Why does Xiaomi dare to cut prices at a time when the entire industry is complaining about high token costs? And where will this price cut take the AI industry?
The token price is significantly reduced, and the AI industry gets the strictest "father"
After Xiaomi's announcement, the API of the AI large - model series MiMo - V2.5 will have a permanent price cut, with a maximum reduction of 99%. The differentiation based on context length is removed. The new prices took effect worldwide at 0:00 local time on May 27.
However, a 99% reduction doesn't mean that every call is calculated at the lowest price. The decisive variable is whether the input cache is hit.
Take MiMo - V2.5 - Pro as an example: If the cache is hit, the input price drops to about 0.025 yuan per million tokens. If the input cache is not hit, the price remains at 3 yuan per million tokens, and the output price is 6 yuan per million tokens.
That is, the condition for this extremely low price is that a large number of requests must hit the cache.
For applications with high context repetition, high - frequency agents, multi - stage code tasks, and batch inference tasks, this price is extremely attractive. However, if the cache hit rate in your application scenario is poor, the actual cost will surely not reach the lowest point.
The principle of the token plan is based on a similar logic.
Xiaomi emphasizes that the price remains the same, while the credits are greatly increased. The monthly fees for the Lite, Standard, Pro, and Max tariffs remain at 39 yuan, 99 yuan, 329 yuan, and 659 yuan. The credit amount is increased from 60 million, 200 million, 700 million, and 1.6 billion to 4.1 billion, 11 billion, 38 billion, and 82 billion.
According to the new conversion formula, MiMo - V2.5 - Pro only needs 2.5 credits/token when the cache is hit, and 300 credits/token when the cache is not hit. The output price is 600 credits/token.
This exactly corresponds to DeepSeek's strategy.
A brief look back at the timeline: On April 24, the preview version of DeepSeek V4 was released. The next day, there was a 75% discount for V4 - Pro. On April 26, the price for cache hits dropped to one - tenth of the original price. On May 22, the temporary discount became a permanent price cut, and the V4 - Pro was permanently reduced to one - quarter of the original price.
After these adjustments, the input price for cache hits of DeepSeek - V4 - Pro has dropped from 0.1 yuan to 0.025 yuan. With Xiaomi MiMo - V2.5 - Pro's rapid response, the input price for cache hits of Chinese models is now firmly at this level.
Both DeepSeek and Xiaomi are betting on cache hits and application scenarios, and the reason is simple. Large models are moving from chat functions to practical applications, and it is especially in the case of agents that token consumption really increases significantly.
In the chat scenario, the user asks a question, and the model gives an answer. The costs are relatively easy to estimate.
In the agent scenario, a task can include long contexts, multi - stage inferences, code generation, tool calls, web page reading, file analysis, and result verification. The user only sees the final output, but in the background, multiple requests and a lot of context reading may have already taken place.
This is why cache hits are so important.
Agents, code assistants, and applications with long contexts have a common feature: Many contents appear repeatedly, such as system prompts, project code, API documentation, tool descriptions, historical conversations, dependent files, etc. If these contents are recalculated every time, the costs are very high. However, if they can be stored in the cache, the costs for reuse are only calculated according to the cache - hit price, and the inference costs are significantly reduced.
That is, the lower the price for cache hits, the better the model is suitable for high - frequency, multi - stage, and long - term real - world work applications. Behind the low prices of DeepSeek and Xiaomi lies the intention to attract developers and high - frequency applications, so that more agents, code assistants, and office automation applications run on their models.
Similarly, Xiaomi previously encouraged more people to test MiMo and solve real - world problems through activities such as MiMo Orbit and the Million - Billion Token Creator Incentive Program. The Million - Billion Token Incentive Program started on April 28, and by 16:08 on May 26, all 100T tokens had already been distributed ahead of schedule.
From the platform's perspective, a low token price and free quotas mean a huge number of real calls. Real calls bring complex tasks, faulty examples, user feedback, agent workflows, code scenarios, and long - context data, which in turn help improve the model and the inference system.
The phenomenon of "shrimp farmers" in the community can also be understood in this context. By maximizing their quotas, users help the platform generate pressure, discover problems, and collect call data.
Therefore, one should not only look at the gross profit per inference. The short - term loss of revenue is compensated by the migration of developers, the volume of calls, and real feedback. For model providers who want to capture the agent ecosystem position, this is a very worthwhile investment in the platform.
Luo Fuli's "It tastes good!" law is based on technical implementation ability
However, intention alone is not enough. It is important to be able to afford the price cut. The special thing about Xiaomi's price cut is that it contradicts the previous public statements of Luo Fuli, the leader of the MiMo large model.
A month ago, Luo Fuli publicly voted against a token price comparison. She believed that low token prices and the disclosure of a third - party agent framework could easily put the platform into a cost crisis.
She mentioned that third - party agent frameworks often have rough context management. A single user request can trigger multiple unprofitable tool calls, and each request can contain a context longer than 100,000 tokens. If the platform cannot restrict this waste, the actual API costs can be many times the subscription price.
She also believed that the global computing power no longer meets the token demand generated by agents. Before large - model companies clarify the cost structure for programming and agent scenarios, blind price competition will lead to restricted access, reduced performance, and deteriorated stability, and ultimately affect the user experience.
But Xiaomi's price cut does not refute the previous assessment, but changes the prerequisites for a price comparison. Luo Fuli previously voted against low prices without a stable cost structure. However, what Xiaomi shows us now is a technical concept that can support low prices.
After Xiaomi's announcement, its technical team implemented full support for SWA, i.e., Sliding Window Attention, based on SGLang HiCache. As a result, the data traffic of the KV cache between different storage levels such as GPU memory, CPU memory, and SSD is reduced to almost one - seventh of the previous value, and the number of cacheable tokens is increased to almost five times the previous value.
At the same time, Xiaomi optimized the expert - parallel concept and the strategy for input - length classification to improve the input throughput of the cluster. Without this technical ability, a low - price strategy would easily lead to an unsustainable subsidy. Only with strong infrastructure capabilities can a low price become a long - term advantage.
The price comparison tests engineering ability and also tests the strength of the backend.
In contrast to pure AI model companies, Xiaomi's mobile phone, auto, IoT, and consumer electronics businesses offer a longer investment period and more strategic patience. It can regard the large - model service as an entry point into the AI ecosystem and thus avoid falling into the trap of only focusing on short - term API revenues.
This is not favorable for small and medium - sized model companies. Companies without a main business line, without solid infrastructure strength, and without a sufficient number of calls to spread the costs cannot maintain these prices in the long term.
DeepSeek's low prices already threaten the market positions of many domestic and international models. With Xiaomi MiMo's price cut, more companies of a certain size will be forced to adjust their prices or re - define the value creation of their products. Smaller model providers may be pushed into narrower niche markets.