Surpassing DeepSeek-V4, Luo Fuli presents Xiaomi's most powerful open-source model, which is compatible with 5 domestic chips on the first day.
According to a report by Zhidx on April 28th, just now, Xiaomi open-sourced the MiMo-V2.5 series of models developed under the leadership of Luo Fuli. These models adopt the MIT license, allowing commercial inference deployment and secondary training without additional authorization.
▲ Screenshot of the open-source page of MiMo-V2.5-Pro on Hugging Face
Previously, the public beta of this series of models was launched on April 23rd, including two models: MiMo-V2.5-Pro and MiMo-V2.5. The models have stronger Agent capabilities, support a context of up to 1 million tokens, and have significantly improved token efficiency.
The complete benchmark test results of MiMo-V2.5-Pro were announced today. Xiaomi claims that it has outperformed the latest open-source DeepSeek-V4-Pro model in multiple evaluations such as GDPVal-AA (Elo) and Claw-Eval (pass^3), and also exceeded mainstream closed-source models like the recently released Kimi K2.6, achieving the overall best performance.
▲ The latest evaluation results of MiMo-V2.5-Pro
On the first day of open-sourcing, MiMo-V2.5-Pro announced that it had completed the access and adaptation with multiple chip manufacturers, including Alibaba T-Head, Amazon Web Services, AMD, Baidu Kunlun Chip, Enflame Technology, Muxi, and Daysci. The MiMo-V2.5 series of models also completed the Day 0 adaptation of the SGLang and vLLM mainstream inference frameworks.
Meanwhile, Xiaomi also launched the One Quadrillion Token Creator Incentive Program, planning to freely distribute a total of 100 quadrillion token rights within 30 days. It also launched the Agent Ecosystem Co-construction Program and has currently cooperated with Agent framework manufacturers such as OpenCode, Hermes Agent, and KiloCode.
Model weight collection:
https://huggingface.co/collections/XiaomiMiMo/mimo-v25
For more details, refer to the model blog:
https://mimo.xiaomi.com/index#blog
Application website for the One Quadrillion Token Program:
https://100t.xiaomimimo.com/
01.
Release of Model Technical Details
Outperforming DeepSeek-V4 in Evaluations
According to the model card newly released by Xiaomi, MiMo-V2.5-Pro, the most powerful model developed by Xiaomi to date, is a mixture-of-experts model with 1.02 trillion (1.02T) parameters, including 42 billion (42B) active parameters. Based on a hybrid attention architecture, it has achieved significant improvements in general intelligence, complex software engineering, and long-term task processing compared to its previous models.
MiMo-V2.5-Pro inherits the hybrid attention mechanism and multi-token prediction (MTP) design of MiMo-V2-Flash. The local sliding window attention (SWA) and global attention (GA) are used alternately at a ratio of 6:1, with a window size of 128 tokens. In the case of long contexts, the key-value cache storage space is reduced by nearly 7 times through a learnable attention pool bias while maintaining performance. A lightweight MTP module, using a dense feed-forward neural network (FFN), is natively integrated for training and inference, increasing the output throughput by approximately three times and accelerating the deployment of reinforcement learning (RL).
▲ The model architecture and training process of MiMo-V2.5-Pro
The model was pre-trained using 27 trillion (27T) tokens, with FP8 mixed precision, a native sequence length of 32K, and the context extended to 1 million tokens. The post-training follows the three-stage paradigm introduced in MiMo-V2-Flash: 1. Supervised fine-tuning to establish basic instruction following on carefully selected data pairs; 2. Domain-specific training, where different teacher models are optimized through reinforcement learning for specific domains, covering areas such as mathematics, security, and intelligent tool usage; 3. Multi-teacher policy distillation (MOPD), where a single student model learns strategies from its own expansion under the token-level guidance of each specialized teacher and integrates the capabilities of all teachers into a unified model.
Now let's look at MiMo-V2.5. It is a sparse MoE model with 310 billion (310B) parameters and 15 billion (15B) active parameters, trained on 48 trillion (48T) tokens. Its language backbone framework inherits the hybrid sliding window attention mechanism of MiMo-V2-Flash and is equipped with self-developed pre-trained visual and audio encoders. The two types of encoders complete cross-module fusion through a lightweight projection module.
▲ The architecture of MiMo-V2.5
The training process is divided into five stages: 1. Conduct text pre-training based on diverse corpora to build the backbone network of the large language model; 2. Perform pre-training of the projection layer to achieve the alignment and fusion of audio-video and visual projectors with the language model; 3. Carry out large-scale multi-modal pre-training based on high-quality cross-modal datasets; 4. Execute supervised fine-tuning and post-training of the agent. During this process, the context window is gradually expanded from 32K to 256K and finally reaches 1 million tokens; 5. Finally, further enhance the model's perception, logical reasoning, and agent execution capabilities through reinforcement learning (RL) and multi-objective preference distillation (MOPD).
According to the latest evaluation results released by Xiaomi, MiMo-V2.5 has significantly outperformed the recently released DeepSeek-V4-Flash of DeepSeek in multiple evaluations such as Claw-Eval Text, Terminal-Bench 2.0, and SWE-Bench Pro.
▲ The latest evaluation results of MiMo-V2.5
02.
On the First Day of Open-Sourcing, Completed Adaptation with 7 Chip Manufacturers including Alibaba T-Head and Muxi
Xiaomi also announced the latest adaptation status of the chip ecosystem and inference frameworks. On the first day of open-sourcing, MiMo-V2.5-Pro completed the access and adaptation with multiple chip manufacturers:
Alibaba T-Head: Achieved in-depth adaptation based on the Zhenwu 810E and the full-stack self-developed AI software stack.
Amazon Web Services: Completed in-depth adaptation based on the Trainium2 chip and the Neuron SDK + vLLM inference framework, achieving Day 0 adaptation for global availability upon open-sourcing. The next-generation 3nm Trainium3 will further unleash the model's performance.
AMD: Provided Day-0 adaptation and comprehensive optimization support based on the ROCm open-source software stack.
Baidu Kunlun Chip: Ensured the stable and efficient operation of the model through underlying operator optimization and software-hardware co-acceleration.
Enflame Technology: Conducted in-depth optimization based on the self-developed TopsRider software stack and completed full adaptation on the Enflame L600.
Muxi: Achieved end-to-end native support from Triton syntax to the Muxi GPU instruction set based on the Xiyun C series and the full-stack self-developed MXMACA software stack.
Daysci: Achieved Day 0 in-depth adaptation.
In addition, the MiMo-V2.5 series of models also completed the Day 0 adaptation of the SGLang and vLLM mainstream inference frameworks.
03.
Freely Distribute 100 Quadrillion Tokens
Cooperated with Hermes Agent, etc.
Meanwhile, Xiaomi also launched the MiMo Orbit Program, which consists of two parts: the "One Quadrillion Token Creator Incentive Program" and the "Agent Ecosystem Co-construction Program" for Agent framework teams.
In terms of the One Quadrillion Token Creator Incentive Program, Xiaomi will freely distribute tokens to global AI users, with a total of 100 quadrillion token rights to be distributed within 30 days, until they are all given away.
This program adopts an application system. Those who pass the application can get the Max-level Token Plan at most, which includes 1.6 billion Credits, worth 659 yuan. Activity time: from 00:00 on April 28th, 2026, to 00:00 on May 28th, 2026, Beijing time.
In terms of the Agent Ecosystem Co-construction Program, Xiaomi provides special support to global Agent framework teams, offers limited-free support of MiMo Tokens for the frameworks, and participates in and sponsors co-creation activities such as AI Hackathons on the framework platforms.
It has currently carried out in-depth cooperation with Agent framework manufacturers such as OpenCode, Hermes Agent, and KiloCode.
04.
Conclusion: Multiple Domestic Open-Source Models "Draw Swords" in the Competition
Recently, the open-source efforts in the large model industry have been continuously increasing. The "Day 0" adaptation of models with domestic and international chips has changed from a highlight to a necessity. Inference efficiency and deployment cost have become the core of the competition in the next stage. At the same time, the free incentive of hundreds of billions of tokens and the co-construction of the Agent framework ecosystem reflect that the industry is shifting from "competing on parameters" to "competing on applications."
It is worth noting that Xiaomi's MiMo-V2.5-Pro has directly outperformed the latest open-source DeepSeek-V4-Pro model of DeepSeek in multiple benchmark evaluations. It can be said that it has "drawn the sword" against DeepSeek in the open-source track, which is expected to force the industry to reduce the inference cost more quickly and improve the completion rate of real tasks by agents.
This article is from the WeChat official account "Zhidx" (ID: zhidxcom). The author is Li Shuiqing, and the editor is Yun Peng. It is published by 36Kr with authorization.