DeepSeek sets the questions, and Xiaomi submits the answers.
Early yesterday morning, an announcement about Xiaomi's MiMo large model caused quite a stir in the developer community.
The theme of the announcement was simple: a price cut, and a significant one at that.
The prices of the MiMo-V2.5 series APIs have been permanently adjusted, with a maximum reduction of an astonishing 99%, and there is no longer a distinction based on the context window length. The Credits point system introduced in the previous billing system remains, and the available quota in the packages has increased by about 5 - 8 times.
From a broader perspective, this is not just a promotional activity: it is the first "lightning - fast" follow - up in the domestic large model camp just four days after DeepSeek V4 refreshed the industry's lowest price.
Behind this seemingly inclusive strategy lies the harsh reality of the survival situation of domestic models and a profound misunderstanding about the value of tokens.
01
DeepSeek Sets the Question, Xiaomi Provides the Answer
Many AI industry observers were immediately impressed by Xiaomi's swift action. Just after Liang Wenfeng's DeepSeek "knocked down" the API prices, Xiaomi quickly matched the prices of the V4 Pro and V4 Flash models.
This sends a very clear signal: as I've always believed, the second price war in the domestic AI industry has already arrived and has quietly entered the "red - ocean melee" stage.
Objectively speaking, everyone must admit a fact: at this stage, domestic models still have a generation gap that is difficult to bridge in the short term compared to GPT - 5.5 and Opus 4.7. In my opinion, this gap will continue to widen in the future.
In the field of "top - tier intelligence", domestic models are still making all - out efforts to catch up. However, in the large - scale application scenarios of "non - complex tasks", there is no absolute gap in the intelligence level among domestic models.
When there is no generation gap in intelligence level, cost - effectiveness (ROI) becomes the only moat.
DeepSeek, led by Liang Wenfeng, has proven through its aggressive pricing strategy and several consecutive price cut announcements that, on the basis of still being in the first echelon in terms of performance in China, low price is the most effective way to attract traffic and create "substitute dependence" among users.
Xiaomi's rapid follow - up verifies another logic: in this competition, those who don't follow will have to wait for user loss. The significant price cuts by the two leading domestic model companies are sufficient to prove that the subscription services and API prices of some manufacturers are inflated. This is no longer a question of "whether to cut prices" but a matter of survival: "no price cut, no survival".
02
The Mathematical Game Behind 11 Billion Credits
Although the prices have changed dramatically, the Credits billing unit used by Xiaomi when launching the subscription service remains unchanged.
From a marketing perspective, this is indeed a simple and smart move: paying 99 yuan for 1.3 billion tokens sounds quite cost - effective, while paying 99 yuan for 11 billion Credits sounds like a great deal.
The impact of these high - value numbers can greatly alleviate users' concerns about "whether the service has been downgraded after the price cut". However, people should still calm down and do some calculations to see the business secrets behind it.
In terms of API prices, the reason why Xiaomi can claim a maximum price cut of 99% is mainly because its original pricing seemed too traditional and conservative under the impact of DeepSeek. To match the prices and prevent users from being snatched away instantly, the API prices had to be cut drastically.
For the subscription service, Xiaomi pioneered the Token Plan in China. This more transparent and interpretable billing method is gradually becoming the global mainstream. The official claims that the capacity has increased by about 5 - 8 times.
Taking the most subscribed, most commonly used, and cheapest Lite - tier subscription as an example, the available token quota has increased from 60M to 500M. After conversion, the unit cost has decreased by about 88%, slightly lower than the API price cut. For higher - tier subscription services, the cost reduction is even smaller.
This difference is easy to understand because the subscription service is essentially a "wholesale price" and has always been much more cost - effective than using the API directly. Whether at home or abroad, users will prefer to subscribe when a subscription service is available.
Therefore, the purpose of Xiaomi's series of measures is clear: attract traffic by matching the API prices with DeepSeek, and then lock in high - frequency users through the subscription service. Even though the discount of the subscription service is not as exaggerated as that of the API, since DeepSeek does not offer a subscription service, Xiaomi's Token Plan is currently the most cost - effective "computing power package" on the market.
This differentiated design actually guides user behavior: it encourages users to make high - frequency and repetitive calls to the intelligent agent because only in such scenarios can Xiaomi achieve the lowest cost and users can feel the lowest price.
03
The Same Price, Different Values
When prices are leveled, the only indicator that determines the winner becomes the productivity value of tokens.
According to the evaluation by Artificial Analysis and the feedback from actual tests, Xiaomi's MiMo V2.5 Pro and DeepSeek V4 Pro have different value orientations.
DeepSeek is more like a one - sided player. It has a slight lead in programming and logical reasoning abilities and has been more successful in capturing users' minds. Currently, it is the first choice for many individual developers or small development companies. However, the lack of multi - modality in DeepSeek seriously affects the expansion of application scenarios. Its current expert - mode image recognition is only marginally useful.
Xiaomi has created an all - around player. When the model was released, it was clearly marked as "full - modality". At the same API price, Xiaomi's tokens can handle complex interaction forms such as images, audio, and videos. Compared with DeepSeek, which can only handle text, it has more advantages in intelligent agent applications.
This is also a point I've repeatedly emphasized: in the era of intelligent agents, multi - modality capabilities should not be ignored but should be given more attention.
Then, where does Xiaomi get the confidence to cut prices? The technical details mentioned in the announcement vaguely reveal how Xiaomi reduces the physical cost of each token.
The two terms SGLang HiCache and SWA (Sliding Window Attention) are worth paying attention to. Simply put, Xiaomi believes that in the inference process of large models, the most expensive part is the KV Cache in the GPU memory.
The SWA technology allows the model to no longer consume a large amount of memory to remember useless words from tens of thousands of words ago, which explains why Xiaomi has cancelled the tiered billing based on the context window length. The multi - level storage optimization has reduced the data transfer between the video memory, memory, and SSD to one - seventh of the original.
The technological leadership ultimately translates into pricing freedom.
When Xiaomi can reduce the cache hit cost to one - tenth or even one - hundredth of that of the previous - generation model, a 99% price cut is not just a charity or a simple marketing move. It is about releasing technological dividends and weeding out competitors with outdated technical architectures and high costs.
04
Beware of Token Monetization, Intelligence is the Real Value
Finally, regardless of the price cuts by DeepSeek or Xiaomi, everyone in the AI industry should pay attention to a deep - seated industry chaos.
In the current AI market, the term "token" seems to have been alienated into a kind of "standard currency". In the past two months, some companies have started to evaluate employees based on "how many tokens they consume each month", and developers have started to show off their token usage.
But this is a misunderstanding: tokens are not currency, and the value of tokens in different models is completely different.
Top - tier models like GPT - 5.5 and Opus 4.7 have high - value tokens because they can complete complex tasks with a small number of tokens, with extremely high productivity density.
For tokens of low - intelligence models, even if they are supplied in billions, if they cannot solve problems, their productivity value is still close to zero.
Recently, many domestic and foreign manufacturers have raised the prices of their subscription services and APIs taking advantage of the popularity of programming proxy software. In essence, they are taking advantage of the ambiguity of the token concept to deceive users who are not familiar with AI into thinking that tokens of all models are the same raw material.
Now, DeepSeek has disrupted the market, and Xiaomi has firmly blocked the exit. The essence of the actions of these two companies is to let tokens return to their real value: as a cheap "cyber industrial consumable", it must be cheap enough to support large - scale AI applications.
The second price war of large models has quietly begun, and once the prices are lowered this time, it will not be as easy to raise them as after the previous price war. For manufacturers that still hold on to high prices but cannot offer top - tier intelligence, the winter may be closer than expected.
Finally, the conclusion in Xiaomi's announcement is worth sharing with everyone: The value of technology should ultimately be reflected in its wide - spread use.
When tokens are no longer expensive, domestic large models can truly transform from laboratory samples into something that everyone can use as needed, like water and electricity.
And the reshuffle of intelligent value has just begun.
This article is from the WeChat official account "Silicon - based Starlight", author: Si Qi. Republished by 36Kr with permission.