MiniMax Launches Mavis: A Veritable Agent "Three Departments and Six Ministries"

Managing agents is like managing people. One should understand the art of governing subordinates...

I assigned a task, and the agent entered the plan mode, planning seven steps.

I approved it, and it started to execute. After completing three steps, it stopped and reported, "I've completed steps 1, 2, and 3. The results are as follows... May I ask if I should continue with steps 4, 5, 6, and 7?"

I said to continue. It completed two more steps and then stopped again. "I've completed steps 4 and 5. The results are as follows... May I ask if I should continue with steps 6 and 7?"

After a whole night, having the agent work on a long - term task didn't yield long - term results. All the dialogues back and forth were just "continue".

For a long time, this has been my experience when using various agents to complete work.

This experience is illogical. Although "stopping to confirm" is a good working habit when collaborating with AI, in many tasks, I never asked it to stop, but it still did.

In its latest technical blog post, MiniMax attributes this behavior of agent products to "context anxiety". The core issue is that the model itself has a vague judgment on "when an ultra - long task is considered completed". To put it simply, it's not that it can't do the task, but that it's afraid of making mistakes. So it stops halfway to ask for instructions after each step.

Today, the desktop version of MiniMax Agent has undergone a major update. A new mode called Mavis has been added (actually, it's an abbreviation for "MiniMax as a Jarvis").

It's well - known that having one agent act as the boss and a group of agents act as employees - this traditional multi - agent framework is nothing new. However, MiniMax points out that the previous mainstream multi - agent frameworks essentially relied on prompt word arrangements to make the model play "role - play". But this approach won't last long and will encounter problems such as the aforementioned context anxiety, long - term task degradation, and self - inspection.

A multi - agent system requires a reliable infrastructure that can run and be maintained continuously, and where multiple agents won't "collude". This is what MiniMax is working on.

Actual test experience: Let the agent "find faults" with the other party

MiniMax named its Agent Team infrastructure Team Engine. Under this engine, there are three core roles: Leader, Worker, and Verifier. As the names suggest, one is for management, one for doing the work, and one for acceptance.

The most crucial difference is that there is an "adversarial" relationship between the Worker and the Verifier, and neither can get away with shoddy work.

Some time ago, APPSO was researching a topic: "All model manufacturers with ambitions in Coding/Agent should develop their own independent Coding/Agent products."

(Yes, MiniMax was a counter - example before, but unexpectedly, it has proven itself before the article was even published!)

So we ran this topic on MiniMax's Agent Team again.

This task was split into five workers. After each worker completed its task, it would organize the results and hand them over to the leader (the status would show as "Mavis sent to General" or "General sent to Mavis", etc.).

One worker hadn't returned results after running for 12 minutes. APPSO noticed that the leader was impatient and sent a bash command to check its working status:

After all five workers completed their tasks, the leader generated five verifiers - shown as agents wearing "yellow hats" in the task list:

The verifiers quickly found errors! One of the verifiers discovered a clear data error in the corresponding worker's deliverables and gave a "failure" judgment. Immediately afterwards, the corresponding worker restarted (shown as running, with a small blue circle icon).

Clicking into the corresponding worker's workspace to observe its thinking process: "The verifier rejected my previous deliverables due to the following three errors... I need to go back and re - check the key facts and correct the specific numerical issues..."

It has to be said that agents are "impartial" towards each other, and they really work reliably.

This kind of back - and - forth happened dozens of times in five groups of 1 - on - 1 agent confrontations. During the process, Mavis also said that it "learned something new" and updated its memory.

While the previous task was running, we started a new in - depth study, analyzing the tourism market during the May Day holiday based on authoritative data and delivering a multi - dimensional analysis report.

This study is more complex than the previous task. And because of the continuous confrontation, the Agent Team spends much more time on in - depth studies than a general single agent.

However, the final report is indeed cleaner and more reliable compared to the content delivered by other AI in - depth studies.

Recently, APPSO has been preparing many offline events, and planning and coming up with ideas have always been difficult. We also gave this task to Mavis to see how it works.

I need to plan an offline salon for AI developers in Guangzhou. Please provide me with as many suitable venues for events of a hundred or a thousand people in the technology field as possible, along with approximate quotes. Also, collect information on similar events, and help me plan the theme, promotion, and operation of this AI event. Organize all these into a strict business plan format and design a beautiful web page that matches the theme.

Just the time for formulating the plan was longer than that of the previous in - depth study task. Mavis replied, "This task is large - scale and requires multiple agents to work in parallel - venue research, competitor information collection, theme planning, business plan preparation, and web development."

The outstanding feature of Mavis is that we can continuously add new requirements:

While giving me the long report, it would be best to draft a preliminary formal contract, including contracts for venue cooperation, cooperation with invited guests, etc. Also, provide a preliminary financial form and a detailed PPT for reporting this plan.

After the Agent Team received the new requirements, it further improved the plan and initiated more workflows. Finally, we launched as many as nine parallel tasks.

When we click to view Mavis's thinking process, we can see a large number of messages sent between agents. These agents work under the dedicated Team Engine, transmitting their respective states. Some are waiting, some are executing, and some are verifying.

Doesn't this Verifier look like a nit - picky "client"?

Finally, the number of files delivered for the entire task reached an astonishing more than ten, including xls, ppt, html web pages, and their corresponding.md versions.

The financial budget table generated by the Agent Team includes a project budget summary, cash - flow forecast, ticket price and sponsorship pricing model, and a detailed cost ledger.

Next, let's talk about another major feature of Mavis: it can connect to chat platforms and support multi - tasking.

Similar to OpenClaw and Hermes Agent previously supported by MiniMax, Mavis itself can also achieve task assignment through the WeChat and Feishu IM channels. The access process is also extremely simplified. Just click the settings button, scan the QR code, and name it, and we can use Mavis in WeChat/Feishu.

When a general agent product is connected to an IM, if we assign it a task that takes a long time to complete, usually after sending the message, we can't consult it about other issues.

Part of the reason is that these agents can't open multiple conversation windows simultaneously; another reason is the limitation of the agent's working mode. Running multiple tasks in one session can easily lead to context confusion and context pollution.

MiniMax's solution is to decouple the logic of "instant response" and "execution".

APPSO asked it to research the recent increase in oil prices in Feishu. After the task started, I asked it to research the important products launched by Silicon Valley AI giants in the past month.

Mavis didn't stop the previous task and directly told me that the new task was completed, while the oil price research task was still being processed.

This is another major design concept of Mavis: the benefits of context isolation.

Each Agent Team and each agent in the team only sees the information summary related to their own tasks and will only read the full text when details are needed.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

MiniMax has launched Mavis, a veritable Agent "Three Departments and Six Ministries".

Actual test experience: Let the agent "find faults" with the other party