HomeArticle

Hire a full-stack stand-in with an hourly wage of $1, MiniMax M2.5 allows office workers to experience the feeling of being a boss.

量子位2026-02-13 11:08
Specializing in intelligent agents and Vibe Coding

A dark horse has emerged in the battle of Spring Festival models.

Today, MiniMax officially announced its new model M2.5, which started running two days ahead of schedule. It still focuses on agents and Vibe Coding, with performance comparable to Claude Opus 4.6.

It's not picky. It can write code for PC, mobile apps, React Native, and Flutter, and it's a true full - stack solution including both front - end and back - end with database support.

Previous models could only create a front - end interface at most, but M2.5 can deliver the entire package including front - end, back - end, and data storage.

It's also designed for the agent ecosystem. When combined with scaffolding like OpenClaw, it can directly translate your natural language into specific computer operations.

You only need to understand the business logic. For the rest of the full - stack code implementation, it can deliver results to you at a speed of 100 TPS, and the cost is only $1 per hour.

10B active parameters place it in the first echelon

In terms of the two key indicators of code writing and task execution, M2.5 is now on par with Claude Opus 4.6.

For example, on the most challenging SWE - Bench Verified list in programming, it scored 80.2%. It even ranked first in the multi - language task Multi - SWE - Bench.

Moreover, in Vibe Coding mode, it can handle the entire full - stack process, from the interface to back - end logic and database design, delivering a complete set of usable code in one go.

For example, given the requirement for a "luxury cat tunnel e - commerce website" with a minimalist style, parallax scrolling effect, and a 3D configurator in the background.

The results generated by M2.5 can directly present a blockbuster - level autoplay video effect. The 3D configurator also works well. The overall look of the website is very high - end, and it's a complete and truly runnable project.

This confidence comes from its evolved "native Spec behavior" — before writing code, it will proactively disassemble the functional structure and UI design like an architect.

It can handle the entire full - stack because it has been trained in more than 10 programming languages such as Go, Rust, and Python, and hundreds of thousands of real - world environments.

When dealing with long - chain tasks, M2.5 has also been specially optimized. It can work smoothly with both mainstream frameworks and self - written scripts.

Here, it introduces the Process Reward mechanism, which can monitor the completion quality across the entire chain, solving the problem that long - term tasks tend to "go off - track".

This logic ability brought by this mechanism is particularly obvious when dealing with tedious and repetitive tasks. For example, when counting the Forbes Rich List, it needs to crawl net worth, age, and source of wealth.

The table generated by M2.5 is very professional. It automatically creates three sheets: Cover, BillionairesData, and Sources, clearly separating the cover, data sources, and detailed data. The format is as regular as if it were done by an obsessive - compulsive employee.

Despite being able to handle such complex tasks, M2.5 only has 10B active parameters, making it the smallest flagship model in the first echelon.

Combined with a deeply optimized thinking process, its inference throughput has reached 100 TPS, twice the speed of mainstream flagship models. When running large - scale data cleaning or code bug - fixing tasks, you can experience the thrill of instant results.

Capable of writing full - stack code and operating local systems

The previous two online demos were just appetizers. Now, let's put M2.5 to the test in a real - world agent environment.

According to MiniMax, adapting to various agent frameworks is one of M2.5's major strengths.

When it comes to agent frameworks, the popular OpenClaw can't be ignored. So let's install it on my computer and connect M2.5 to it and see what happens.

Since M2.5 is newly released, there's no option for it in OpenClaw's installation wizard yet. So the installation process was a bit manual, and I won't go into details here. Anyway, it was successfully connected in the end.

However, it's too troublesome to communicate with OpenClaw through the background dashboard. So I plan to connect it to my Feishu.

The stage is set for M2.5. Now let's see how it performs.

I used Python to generate a folder on my desktop with 100 messy financial files, and then gave OpenClaw a straightforward task: First, clean up all the file names and unify them into the format of "date + supplier + amount".

Of course, that's not all. It needs to understand these data thoroughly, sort them by expenditure categories, and finally generate a monthly financial analysis PPT with charts, which should be both informative and aesthetically pleasing.

Let's take a look at what the files looked like before the organization:

Next, we'll assign the task to OpenClaw, which is being operated by M2.5, through Feishu.

All of a sudden, all the files in the folder had their names changed to the format we required.

Meanwhile, in Feishu, OpenClaw reported its work progress and summarized the monthly expenditure.

As for the PPT, obviously I was too lazy to look for it in the folder, so I just asked OpenClaw to send it to me through Feishu.

The exciting moment of acceptance is coming soon.

OpenClaw, directed by M2.5, chose a very tech - savvy dark theme with a blue - green color scheme that looks very comfortable.

Moreover, it didn't just fill in the data. It really understood those bills.

For example, in the pie chart, it immediately noticed that "cloud computing services" accounted for almost 90% of the total, and it specifically marked the second week as having the highest expenditure on the key indicators page.

On the last page, it also put forward improvement suggestions. It found that too much money was spent on "Nebula Cloud Computing" and directly suggested negotiating an annual contract to reduce costs. This ability to extract business insights from data has gone beyond simple chart - making.

It can be seen that in the agent environment, M2.5 is indeed a qualified "brain", giving me the feeling of being a boss ✨(⌐■_■)✨.

In addition to agents, another skill that MiniMax is proud of is Vibe Coding.

Here, we'll use VSCode and connect through Cline to see if M2.5 can handle the entire development process including back - end, front - end, communication, deployment, and debugging all at once.

I asked it to write a real - time multi - user collaborative to - do list system using Java Spring Boot.

The functionality is actually not simple. It needs to use WebSocket for real - time synchronization across multiple devices and strictly control permissions so that only the person who creates a task can modify it.

There are also requirements for the interface aesthetics. It must show a sense of technology, giving the impression of a hacker terminal.

After receiving the task, M2.5 started by writing two documents: pom.xml and application.yml.

These two files are the "heart" and "brain" of a Java Spring Boot project. pom.xml is like a shopping list for the build tool (Maven). That is, for this "to - do list" project, it lists which ready - made components (dependency packages) are needed.

application.yml (the operation manual) is the settings panel for the program. It sets the rules for how the software should run after startup.

After listing these two lists, it started writing the main body and Java code for each module, as well as the front - end HTML. It also created a database file.

After all this was written, Cline, driven by M2.5, would automatically compile and run the program. And if there were any errors during this process, it would read the error information and automatically modify the code.

After some efforts, the back - end program finally started running, and the front - end page was also up and running on port 8080. Indeed, the interface was both simple and had the technological feel I required.