Actual Test of MiniMax M2.7: When AI Gets Competitive, It Even Outdoes Itself
After Lobster became extremely popular, the entire internet's attention was fixed on "how to use it" — local deployment or cloud, one-click installation or command-line input, whether to connect with WeChat or Feishu... Instead, no one seriously asked the old question: Is the "brain" driving Lobster smart enough?
This is not surprising. The new models recently released by OpenAI and Google are all Mini or Flash versions. The underlying message from the official side is almost written on their faces: They are specially prepared for Agents to consume a large number of Tokens.
The ability boundary of the model itself has instead become the least-discussed topic.
A model truly suitable for Lobster not only needs to have a large Token capacity that is cost-effective but also needs to be smart enough, with strong practical and learning abilities.
Recently, MiniMax officially launched the new MiniMax M2.7 model, which focuses on "initiating the self-evolution of AI" and being "the strongest Cowork Agent model". It can handle code work and common Office tasks and can also actively learn to build a stable Agent system.
Specifically, it can handle a wider range of tasks than most models. When it comes to writing code, M2.7 can truly understand what happens when a system is running, achieving system reasoning at the SRE (Site Reliability Engineering) level, such as reading logs, correlating timelines, inferring root causes, and providing prioritized solutions. The new model scored 56.2% on SWE-Pro, almost catching up with Opus 4.6.
It is sufficient for office scenarios. In complex editing and multi-round modifications of Excel, Word, and PPT, M2.7 has significantly improved, especially in scenarios such as financial analysis that require professional knowledge and formatted delivery. It can't completely replace professionals, but it can definitely serve as an excellent assistant in the workflow.
It won't "break down" in multi-Agent collaboration. This is a specialized ability of M2.7. In multi-role scenarios with clear boundaries, it can still maintain a high level of instruction-following ability in a complex environment with over 50 Skills.
Then, the highlight of this update is that it starts to participate in optimizing itself. MiniMax says that M2.7 is their first model to deeply participate in its own iteration, not just "assisting in iteration" but "deeply participating in its own iteration". Capable of self-evolution, M2.7 can independently iterate the Agent Harness to handle most workflows.
The improvement in practical ability has also enabled MiniMax M2.7 to quickly climb up the Lobster list after its release, reaching the fourth place on the highest-score leaderboard.
The PinchBench leaderboard is a model evaluation benchmark tailored for OpenClaw. It tests the performance of large models in real business scenarios of OpenClaw. The figure shows the task success rate indicator. MiniMax M2.7 ranks fourth, after Claude Opus 4.6 | https://pinchbench.com/
We also integrated the MiniMax M2.7 model and MaxClaw provided by MiniMax into Claude Code and locally deployed Lobster. Then we handed over all the bugs encountered in the real development process, boring financial data, and a large number of long-process tasks to it.
After two days of testing, we found that not only does software need to be redesigned for AI, but also AI models themselves, in addition to understanding human intentions and producing satisfactory results, need to understand the working methods and workflows of AI and learn to optimize themselves.
Using the AI workflow as a human assistant
After the popularity of Agent frameworks such as OpenClaw, the real "AI-era workflow" should be that AI serves as the core operating hub, calling dozens of tools, commanding other AI teammates, and even optimizing its own code.
Before testing how MiniMax M2.7 self-evolves, I want to first see how its AI workflow is. Is it really a useful Agent model, or is it just for a good-looking benchmark result but hard to use in practice?
We downloaded a set of historical stock data from the well-known machine learning competition website Kaggle. Then, according to the competition requirements, we asked MiniMax M2.7 to help us meet the corresponding needs, that is, to perform appropriate data processing and feature engineering based on the given data and generate a visual analysis report for us.
The entire dataset is quite large, with over 3000 rows of tabular data, and the overall file size reaches 446.35 MB. After downloading the five tabular data files locally, we used Claude Code integrated with MiniMax M2.7 to complete this task.
To do a good job in this analysis, the model needs to act as a data analyst to clean and organize the data, a macro analyst to gain insights into the financial market, a statistical analyst to perform preliminary mathematical modeling, an algorithm engineer to build the corresponding model, and finally a web engineer to provide a visual solution.
Facing such a complex task, MiniMax M2.7 fully utilized the various Skills I had installed. It first used the xlsx provided by Anthropic official to read the information of the tabular data structure, then started writing Python code and automatically installed the Pandas library (commonly used for processing tabular data), and proceeded step by step.
Finally, MiniMax M2.7 also provided a complete visual solution. It generated multiple images to show the return distribution, the importance of different features, category rankings, and a comprehensive dashboard.
In the visual web page, it used the Streamlit library to directly convert the data script into an interactive web system, where all information can be viewed dynamically.
If MiniMax can successfully complete such large-scale project tasks, there's no need to mention the daily office and programming tasks.
First, we operated Lobster on our mobile phones to summarize the files on our computers. Then we asked MiniMax M2.7 to write a research plan in a Word document based on this file, organize an Excel document of relevant papers, and finally create a PPT document for a group meeting report, all of which can be operated directly on the mobile phone.
Lobster integrated with MiniMax M2.7 can quickly respond to requests
Handling the three major Office applications is a piece of cake now
Its advantage in the office field also enables MiniMax M2.7 to achieve an ELO score of 1495 in the GDPval-AA evaluation, which measures professional knowledge and task delivery ability, making it the highest-scoring domestic model.
Some time ago, the visual panel of the AI work assistant was very popular. It placed Lobster in a real anime-style office, and it could be installed in your own OpenClaw with just one sentence. We also successfully gave this Appso little lobster its own home. But if I want to modify the layout of the anime room, what can I do? Leave it to MiniMax.
In the visual local interface of OpenClaw, we directly sent the message "How can I modify the style of this small house?" MiniMax M2.7 will automatically read the project code and then tell us which parts can be modified and how to modify them.
Since my request was for the style of a technology editorial office, it helped me modify it to have a Star Wars poster and added more than a dozen people sitting in front of computers typing.
However, we didn't configure the API Key of Nano Banana Pro in OpenClaw, so MiniMax M2.7 in OpenClaw chose to generate simple pictures using code.
Then, by chatting with it, we can also design an editorial tycoon game based on this style, where the one who completes more tasks will have a larger office and can level up.
If it's MiniMax's official MaxClaw, it directly supports multi-modal generation and can generate videos, audios, pictures, etc. in one step without the need to configure additional APIs.
We used the gif-sticker-maker Skill provided by the official to generate several emojis of Elon Musk. The cloud-deployed MaxClaw can ensure the security of the operating environment, but it doesn't allow us to install different library files at will like operating a local computer.
Finally, when converting a video into a GIF, MaxClaw reminded me that it didn't have enough permissions to install ffmpeg (an open-source multimedia processing library) on the cloud server.
You can directly use MiniMax M2.7 in MaxClaw. It will automatically call video, audio, and picture generation models such as Conch to generate multimedia files for us without the need to configure a dedicated API KEY.
By clicking on the skills below the MaxClaw dialog box, we can view the details of all the Skills installed in MaxClaw. And by clicking "Ask MaxClaw", it will automatically compose a message "Tell me what frontend - dev can do and how to use it", guiding us to learn how to use this Skill.
In addition to the GIF generation Skill, MiniMax also provides a skill library including front - end development, full - stack back - end development, Android and iOS application development, and GLSL shading technology for creating amazing visual effects. We can directly send the message "Can you help me install the