Tencent AI still has a hidden card.
Tencent AI has just shown its hand. Yuanbao, the Lobster Special Forces, and the panoramic view of shrimp farming. The product matrix is laid out on the table, and the ecological layout is clearly drawn.
But there's a hidden card under the revealed ones.
On March 27th, at the Tencent Cloud Shanghai Summit, A Dao, the Chief Architect of MiniMax's Agent, talked about a training dilemma - the reinforcement learning of large models hit the sandbox bottleneck. A concurrent environment of hundreds of thousands couldn't run on K8S at all. At the same summit, Tang Daosheng made a judgment: "The implementation of AI is not just an algorithm problem, but also an engineering problem."
One is the real pain of a technical staff from a large - model company, and the other is the strategic insight of a leader of a big company. Two people from different directions bumped into the same thing - it's not on any product release list and is hidden at the bottom: a sandbox for hundreds of thousands of concurrent operations, a startup time of hundreds of milliseconds. Two groups of engineers quietly pushed the infrastructure into the next era of Agent before the industry even realized it.
This is the hidden card named "Engineering".
Four Words on the Whiteboard
In early 2022, MiniMax didn't have this name yet. Founder Yan Junjie wrote "Next - Generation AI" on the whiteboard in a shabby office. At that time, GPT - 3 had just been released not long ago, ChatGPT was still a year away, and few people were talking about the term "AGI". In January 2026, four years later, MiniMax was listed on the Hong Kong Stock Exchange, with a 109% increase on the first day and a market value exceeding HK$100 billion.
But when A Dao - Miao Yuhang, as he's called within the company - showed the old photo of that whiteboard on the summit stage, he wasn't talking about the listing.
What he repeatedly mentioned was a technical dilemma: the underlying logic of model training has changed. The model is no longer like finishing a test and handing in the paper - it has to operate files, write code, call tools, and handle exceptions in a real - world environment. Each trial and error requires an independent operating environment. When the demand swells to hundreds of thousands of concurrent operations, the underlying architecture of cloud computing has cracked.
K8S Can't Hold Up
In the past, large models became smarter through classic reinforcement learning - getting a question, generating an answer, scoring, and updating parameters. But by the second half of 2025, the ceiling was clearly visible. The model was very good at "answering questions" in a closed environment, but its performance dropped significantly in the real world - there was no persistent state, and each conversation was a brand - new start; it could only write code but couldn't run it, lacking the self - verification cycle of "write → run → check → fix"; it couldn't access real - time knowledge and couldn't build a real working environment.
After all, a bare model is like an idling engine. The engine itself is not a car. Starting from version M2.5, MiniMax bet on another path: Agentic RL (Agent training based on reinforcement learning) - directly throwing the model into a real operating system environment to work. If it succeeds, there will be a qualitative change in its ability. But it requires brand - new training infrastructure.
Previously, we might just ask AI to write a paragraph or process a simple function. But with the arrival of Agent, we now require AI to repair a running super - large truck or build a working iPhone from scratch. This means that in the Agent era, the training tasks faced by the model are extremely difficult.
Each training task may roll out hundreds of trial paths, and each path requires an independent sandbox environment. Facing thousands of user queries, each query needs to start hundreds of sandboxes to run concurrently.
A Dao told the truth: "At first, we ran it on K8S. Then we found it really didn't work. The concurrency just couldn't get up." K8S - Kubernetes, the de - facto scheduling standard for modern cloud computing. But this system designed for the microservice era can't handle the scenario where tens of thousands of sandboxes are launched simultaneously for Agent training.
Yu Guangyou (Gary), the Deputy General Manager of Tencent Cloud's Agent Runtime product, pointed out a fact: "Inside every large - model enterprise, the training sandbox infrastructure faces two major dilemmas. First, it's CPU - based, not GPU - based, so it's hard to publish papers. Second, when the people working on K8S see you pulling the master (frequently asking for resources from the core of the K8S system) thousands or tens of thousands of times and it's overloading them, their first reaction is - can you stop pulling so much?"
This high - frequency and massive scheduling demand is the most invisible and headache - inducing "friction" in the current large - model implementation project, which directly chokes the neck of model iteration.
Coincidentally, MiniMax releases a model version every month. It might be the only one in China to achieve this frequency, and only OpenAI globally maintains a similar pace. Do the math: during Agentic RL training, every second that the GPU cluster waits for the sandbox to start is a waste of money. With hundreds of thousands of concurrent sandboxes taking several minutes to start, the accumulated waiting time could consume hours or even days of GPU computing power.
If the sandbox is slow for one day, the model lags behind for one day. "The competition is that fierce now."
How the Million - Level Throughput Came About
The problem was there, and MiniMax and Tencent Cloud soon joined hands.
On March 18th, 2026, the two sides announced their cooperation: based on Tencent Cloud's Agent Runtime sandbox product, MiniMax deployed an Agent RL sandbox with a throughput of millions and concurrency of hundreds of thousands, and it ran smoothly in the test environment. A Dao said: "As far as we know, this is one of the largest training sandbox systems in China." Gary said that the scale is at least an order of magnitude higher than that of its peers.
The most intuitive figure for the effect is the startup speed - it was reduced from several minutes to hundreds of milliseconds, several times faster. The previous waste of GPU idling was directly reduced by an order of magnitude.
To support this scale, Tencent Cloud did a lot of hard work at the bottom. At the computing layer: scheduling optimization, kernel lock optimization, snapshot technology, and memory mapping. At the storage layer: a dedicated accelerated storage solution was developed. Gary gave an analogy - "In the past, you needed to buy a cloud disk. Now you can think of it as buying an image disk or a sandbox disk. The disk itself is the image."
He pointed out that everyone is now trying to "put new wine in old bottles", but the original design concepts of these two old bottles (K8S and Serverless) are exactly the opposite of the nature of Agent: imagine Agent as an expert with memory who needs to think in seclusion for a long time. But K8S shuts down and restarts Agent according to the convention, and Agent will instantly lose its memory. Serverless is like a "voice - controlled light" that is frequently turned on and off, but Agent, which is "writing a paper", needs the light to stay on.
This is why Tencent Cloud is building a new pipeline for Agent. Gary emphasized, "It's not because we're smarter than others, but because we truly recognize the problems and values here."
Regarding the training facilities needed by Agent, there's an easily overlooked difference in the industry: most AI companies' approach to solving the sandbox problem is to set up an environment locally - the process runs on their own machines, security is manually confirmed, and the task stops when the computer is turned off.
Tencent Cloud takes a different path: it splits the entire Harness into a cloud - native architecture of "control plane + execution plane". The control plane is responsible for orchestration, permissions, and auditing; the execution plane is the Agent Runtime sandbox. Each task runs in an independent cloud - isolated environment, starts in milliseconds, and is discarded after use. The task status is persistently stored, and it can resume from the breakpoint even after the sandbox is destroyed. One is a "smart terminal with a seat - belt", and the other is a "cloud factory with monitoring and isolation cabins". For enterprise scenarios, security, collaboration, and flexibility are all essential, and the latter is the production - level solution.
The Same Wall
Tang Daosheng's judgment was straightforward - the inference abilities of mainstream large models are all good, and the gap between domestic open - source models and overseas closed - source models is narrowing. The focus of the competition is shifting: it's not about "whose model is stronger", but about who can make good use of the model through engineering means.
At the summit, he broke down the "engineering problem" in detail: for a model to be truly implemented, it needs tool - calling ability, context management, long - term memory, a secure execution environment, and workflow orchestration. He used a single word to summarize all these: Harness, the "scaffolding" of the model. Tang Daosheng made it clear that what Tencent Cloud wants to do is not to sell computing power, but to help enterprises build this set of scaffolding.
The industry is forming a consensus formula: Agent = Model + Harness. The Model is responsible for "thinking", and the Harness is responsible for making the intelligence "useful" - tool calling, code execution sandbox, context engineering, long - term memory management, and workflow orchestration, a whole set of system engineering. The model determines the lower limit of ability, and the Harness determines the upper limit. There is practical data to prove this: spending three months adjusting the Prompt can improve the quality by 20%; spending two weeks building the Harness can increase the task completion rate from 35% to 82%.
This is not a discovery unique to Tencent.
In February 2026, Mitchell Hashimoto, the co - founder of HashiCorp, formally proposed "Harness Engineering". Almost at the same time, OpenAI conducted a radical experiment - three engineers, five months, one million lines of code, and zero lines of manual code. Humans only designed the Harness. Anthropic and LangChain also reached similar conclusions.
The world's top engineering minds in different time zones have hit the same wall at the same time: the ceiling of the model's ability is still far away, but the floor of the engineering framework determines the actual effect. The industry's focus is shifting from "Prompt engineering" to "context engineering" - it's no longer just about "how to write instructions", but about "how to build the entire information system that the model sees".
A Dao visualized it like this: "It's like an F1 racing car. For us to drive it, it's already good if we can drive it back safely. But a real racing driver can set a world record. It's the same with Agent today - can we build an enterprise - level F1 body for it, that is, the Harness?"
MiniMax's practice confirms this. MiniMax M2.7 officially started the self - evolution of the model. AI was deeply involved in the model training, and 50% - 70% of the work in reinforcement learning was independently completed by the Agent. The role of human researchers has changed. They now talk about experiment ideas with the Agent.
But the prerequisite is: the sandbox is fast, stable, and large enough. Harness is a system engineering that includes six components: file system, code execution, memory, search, context management, and orchestration. And the sandbox is the most basic part. All the upper - layer capabilities are based on the premise of "whether the model can really run in an environment". Otherwise, no matter how delicate the Harness design is, it will be stuck at the most inconspicuous link. Tang Daosheng's words "The implementation of AI is an engineering problem" not only refer to the Harness design but also to the hard work of the underlying infrastructure.
The Affinity of Two Groups of Engineers
A Dao showed the whiteboard photo from MiniMax's first day at the summit. "Tencent Cloud provided support on the very first day of our establishment. At that time, we were just an unknown small company, but Tencent Cloud didn't neglect us because of our small size. We built a training computing power cluster together and served our first popular product together."
Over four years, the cooperation has evolved from the computing power cluster to the Agent RL sandbox, global compliance, and upper - layer application access. A Dao said that Tencent Cloud is "highly technology - driven and has an agent - first mindset". It may sound like a compliment, but looking at what actually happened, it points to a specific behavior pattern: when facing problems, don't take detours, don't wait for standards, and get your hands dirty first.
Since K8S couldn't run, they designed a dedicated sandbox from scratch together. Before the industry recognized the value of Agent infrastructure, they invested in it themselves. This kind of tacit understanding was developed in the project, not negotiated in the meeting room.
When MiniMax open - sourced its model, Tencent Cloud distributed the model service through TokenHub. A Dao said: "Even though we're already listed, we're still a small company with only a few hundred people. We can't serve so many big customers." Tencent Cloud helps him deliver the model to more customers. MiniMax's extreme training requirements are also forcing Tencent Cloud to evolve towards the Agent era.
And MiniMax is not the only one hitting this wall. Any company seriously working on Agentic RL will sooner or later encounter the same sandbox bottleneck. The only difference is whether someone has blazed the trail first.
Gary said: "We're at the critical point between two eras, joining hands together." A Dao expressed it similarly: "It's a new era replacing the old one - actually, we're on the same side."
The Hidden Card
MiniMax's M2.7 ranks first among domestic models in the AA large - model list, and the gap with Claude on the SWE - bench Verified is only 0.6%.
A Dao judged: "In one or two years, there may be no more than five companies left in the game." What can keep a company in the game? To measure the level of engineers in this era, there's basically one indicator - how many Agents they can run concurrently to work for them, and how many Tokens they can consume every day.
This indicator applies to both individuals and companies. The real bottleneck in training efficiency is not the GPU - it's the sandbox.
Tang Daosheng also announced at the Shanghai Summit that Tencent Cloud's underlying platform Cube will be fully open - sourced, and enterprises can directly use it for Agent training and deployment. This is actually one of Tencent's solutions to the "sandbox dilemma". Through open - sourcing, it makes the scaffolding of large models more user - friendly.
Tencent has laid out its AI cards on the table - the product matrix, the ecological panorama, the IM entrance, and the Skill toolbox, clearly presented to users. The open - sourcing of Cube is another gesture, not aimed at users but at allies. It's not "look at what we have", but "take these capabilities and use them."
The revealed cards show off strength, and the hidden card shows the real bottom line. One is the breadth of the product ecosystem, and the other is the depth of the engineering infrastructure.
Tang Daosheng's words, "The implementation of AI is an engineering problem", perhaps can have an additional half - sentence: the solution to the engineering problem is never achieved by one person working alone.
Four years ago, the words "Next - Generation AI" were written on the whiteboard. Four years have passed, and the words remain the same. But there are a few more people who wrote them.
This article is from the WeChat official account "New Berry", author: Si Xiaobai, published by KrASIA with authorization.