Hangzhou's Dominance in Global Open - Source Large Models Ends: Shanghai's Minimax M2 Gets Massive Orders with 1M Tokens at Just 8 RMB

Achieve Claude-level performance at 8% of the cost

The throne of the open-source model has changed hands again, and it's still a domestic model!

Previously, the models topping the charts, DeepSeek and Qwen, were from Hangzhou. Now, it's Minimax from Shanghai.

In the tests conducted by the third-party evaluation institution Artificial Analysis, Minimax M2 scored 61 points and ranked first among open-source models, closely following Claude 4.5 Sonnet.

According to the official introduction, Minimax M2 is specifically designed for agents and programming, excelling in programming capabilities and Agent performance.

Moreover, it is cost-effective. Its inference speed is twice that of Claude 3.5 Sonnet, while the API price is only 8%.

Minimax stated that intelligence level, speed, and cost were previously regarded as an "impossible triangle," but with the emergence of M2, this triangle has been broken.

Currently, the complete model weights of M2 have been open-sourced under the MIT license, and the online Agent platform and API are also free for a limited time.

Achieve Claude-level performance at 8% of the cost

Minmax M2 is a MoE model with a relatively high sparsity. It has a total of 230B parameters, and only 10B of them are active.

Netizens said that running with 10B active parameters would be very fast. If paired with an inference acceleration platform like Cerebras or Groq, it is expected to reach over a thousand Tokens per second.

Another feature is the use of an interleaved thinking format, which enables the model to plan and verify operation steps across multiple conversations, which is crucial for Agent reasoning.

As introduced at the beginning, Minimax officially defines M2 as a model specifically designed for agents and programming.

It is built for end-to-end development workflows and demonstrates excellent planning and stable execution capabilities for complex, long-chain tool invocation tasks, supporting the invocation of Shell, browsers, Python code interpreters, and various MCP tools.

In the three most critical capabilities of an Agent - programming ability, tool usage ability, and deep search ability - M2 is comparable to top overseas models in tool usage and deep search, and its programming ability also ranks among the top in China.

In terms of overall performance, M2 ranked fifth overall and first among open-source models in the tests of Artificial Analysis.

This test used 10 popular datasets, including MMLU Pro, GPQA Diamond, the Human Final Test, LiveCodeBench, etc.

The pricing of M2 is $0.3 / ¥2.1 per million input Tokens and $1.2 / ¥8.4 per million output Tokens, which is only 8% of Claude 3.5 Sonnet.

Based on the results of Artificial Analysis, Minimax drew a graph to compare the cost-effectiveness of various models (the lower the cost, the further to the right on the horizontal axis).

The speed of the online inference service can reach 100 Tokens per second. Minimax also drew a graph to show the cost-effectiveness in terms of speed.

Meanwhile, the Minimax team conducted one-on-one comparisons between M2 and other models for three tasks: agent, full-site development, and Terminal Use.

The results showed that M2 had a very high Win + Tie ratio compared to Claude Sonnet 4.5, GLM 4.6, Kimi - K2, and DeepSeek V3.2, and the cost was very low.

To more intuitively demonstrate the Agent capabilities of M2, Minimax has deployed M2 on the Agent platform for free use for a limited time. According to the official statement, the free period lasts until the server can no longer handle the load.

Meanwhile, many ready-made works of Minimax Agent are also displayed on this platform.

Minimax Agent: Can write programs and create PPTs

Using Minimax's Agent platform, various web pages or online applications can be created.

Of course, many classic games can also be replicated and directly deployed in the Web environment using it.

There is even an online Gomoku game platform created by a netizen, which not only has the game itself but also introduces functions such as online battles, spectator mode, online chat, and even user registration.

In addition to programming, it can also generate research reports or PPTs on various topics.

On X, a netizen also showed the practical results of programming with M2 Agent, completing the production of a football mini-game in just three rounds of feedback.

It can be said that the effect is very good.

Beyond the model's performance, the attention mechanism used by M2 has also sparked discussions among netizens.

Mixed attention vs. full attention

Some netizens discovered more technical details of M2 from the vllm code, stating that M2 uses a mixed mechanism of full attention similar to GPT - OSS and sliding window attention (SWA).

However, the person in charge of Minimax NLP corrected this, saying that they initially intended to introduce SWA during the pre - training phase but found that it would cause a loss of performance, so they finally used full attention.

After seeing this, the technical staff of the Falcon team said that they also found the same phenomenon when training their model. The SWA mixed attention would reduce the model's performance, which is inconsistent with the research in some papers.

In some papers and practices, SWA can improve efficiency while maintaining performance. For example, the relevant research on the Mistral and Google Gemma models supports this view.

However, Minimax's actual tests showed that it has limitations in long - range dependency tasks.

Meanwhile, M2 also did not adopt Lightning Attention (a variant of linear attention) for the same reason of performance loss.

On the contrary, some papers advocate that linear attention has more advantages in long - sequence tasks.

Which approach is better may depend on specific requirements, but at least from M2's performance, Minimax has chosen a way that suits itself.

Agent platform: https://agent.minimax.ioHugging

Face: https://huggingface.co/MiniMaxAI/MiniMax-M2

Reference links:

[1]https://www.minimax.io/news/minimax-m2

[2]https://venturebeat.com/ai/minimax-m2-is-the-new-king-of-open-source-llms-especially-for-agentic-tool

[3]https://x.com/jessi_cata/status/1982936050256490968[4]https://x.com/JingweiZuo/status/198282297903069

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The dominance of Hangzhou in the global open-source large models has been ended. Shanghai's Minimax M2 has received a large number of orders upon its release, with one million tokens costing only 8 RMB.

Achieve Claude-level performance at 8% of the cost

Minimax Agent: Can write programs and create PPTs

Mixed attention vs. full attention