Speed Is Key: Anthropic Completes Agent Production in Just a Few Days

Anthropic has come up with something new again.

Early in the morning, Anthropic released Claude Managed Agents, a set of managed tools for building and deploying cloud-based intelligent agents.

In the simplest terms, developers no longer need to handle infrastructure such as security, state management, and permissions, and can directly run agents in a production environment.

The official said that an agent that originally took months to go live can now be put into production in just a few days.

01 Enter production at 10 times the speed

Before this, the real difficulty in agent development has never been the model itself, but the engineering.

We can quickly create a seemingly good demo using Claude or other large models: it can write code, analyze documents, and even automatically call tools. But once we want to turn it into a product that can run stably, problems arise.

It seems to be able to do everything, but it's really hard to use in a production environment.

After the demo is created, developers still need to build a whole set of infrastructure themselves: a secure code execution environment, long-term state management, permission control between different tools, and a mechanism to recover in case of errors. None of these capabilities are very complex on their own, but when combined, it becomes a very time-consuming and large project.

What's more troublesome is that these tasks are almost impossible to reuse.

Once the model is upgraded, the assumptions hard-coded in the harness often become less suitable, and it's difficult to share a stable operating framework between different agents.

Agent development goes round and round: each team is solving the same kind of problems, but few people really make it stable.

Therefore, we often see that agents can easily create demos, but it takes a long time to go live.

Claude Managed Agents aims to solve exactly this problem: secure execution, state management, permission control, error recovery... It packages all these things that originally had to be built by oneself and provides them uniformly by Anthropic.

Developers no longer need to care how the agent runs. They just need to tell it what to do, what tools it can use, and what restrictions there are, and the rest of the execution process will be automatically completed by the system.

The change it brings is also very direct: a whole set of systems that originally took months to build has now become an interface that can be quickly tried and repeatedly called.

It doesn't make the agent smarter, but it significantly shortens the distance between the demo and production.

In addition to the "toolkit" for accelerating the launch, it also does the following things:

First, it supports long-running tasks. Agents can run autonomously in the background for several hours. The progress and output will be continuously saved and will not be lost even if there is an interruption.

Second, it has the ability to collaborate between multiple agents. Agents can create and schedule other agents to process complex tasks in parallel. This ability is currently provided in the form of a research preview and requires a separate application for access.

Then, there is an access and governance mechanism for real systems. Agents can access real systems with scope permissions, identity management, and execution tracking, but the model itself doesn't decide which tools can be called, what permissions can be obtained, and what credentials can be used.

In addition, the system also has a built-in orchestration mechanism for task execution, which is used to decide when to call tools, how to manage the context, and how to recover in case of errors. This means that developers don't need to manually orchestrate the execution process of agents, and the system will automatically schedule during the running process.

These capabilities themselves are not new, but when they are put into the same system, a lot of things are saved.

02 Not just usable, but already in use

In the release, Anthropic also presented a number of implemented cases, basically covering typical scenarios such as collaboration tools, enterprise systems, and development tools.

For example, Notion (a collaboration tool that integrates documents, knowledge bases, and project management) directly integrated Claude into its workspace: engineers ask it to write code, and the content team asks it to build websites and create PPTs, and multiple tasks can be run in parallel.

On the enterprise side, Rakuten (a large Japanese Internet and e-commerce group with businesses covering e-commerce, finance, and communications) has deployed agents in multiple departments, including product, sales, marketing, finance, and human resources. Their approach is straightforward: connect the agents to Slack and Teams, let employees assign tasks like giving jobs, and then get back spreadsheets, slides, and even applications. The official said that an agent can be deployed in a week.

Asana (a software company that provides team task management and project collaboration tools) has a more radical idea. This company originally focuses on project management, and now it simply turns agents into project members to directly participate in task promotion and content output, and the name it gives is also very straightforward: AI Teammates.

The representative on the developer side is Sentry (a developer tool that provides error monitoring and performance analysis). It was originally used to monitor bugs, and now agents can automatically generate repair code and create pull requests, connecting the process from problem discovery to repair submission.

There is also Vibecode (an AI development tool platform that generates and deploys applications through natural language). This kind of AI-native tool goes a step further: users only need to write a requirement, and an application can be directly generated and deployed from the prompt, and Managed Agents has become its default underlying infrastructure.

From all these, we can see that whether it's writing code, creating content, or handling enterprise processes, agents have started to directly take over tasks.

In a sense, when security, state, permissions, and scheduling become default capabilities, agents no longer need to be "packaged" into a system; they can run as a system themselves.

Agents have never lacked capabilities; they just have difficulty being implemented.

In the past, developers needed to build a whole set of frameworks first before agents could start working. Now, this framework already exists in advance, and agents can be directly deployed into it.

This is the significance of Claude Managed Agents.

03 The tool is good, but the problems are just beginning

As soon as Claude Managed Agents was launched, it sparked a lot of discussions.

Many people were surprised by Anthropic's progress speed, and their feelings are like the following meme: every time they wake up, they see another update of Claude.

Sure enough, after the leak incident, Claude Code 2.1.90 was immediately updated. Before the hype of Claude Mythos Preview faded, Claude Managed Agents came out right away.

Anthropic, just keep launching, and we're not tired at all.

Just kidding. While sighing about the release speed, doubts about the new tool also emerged almost simultaneously.

The most direct question is whether it can really run "long-term tasks" well.

Some developers pointed out that the biggest challenge for agents has never been short tasks, but scenarios that require continuous operation and repeated decision-making. Once the time is extended, errors will accumulate continuously, and the system stability will decline rapidly.

Being able to run doesn't mean being able to run for a long time.

Furthermore, there is the issue of "reliability".

In small-scale tests, agents often perform well, but once they enter a real production environment, the task complexity increases, the call chain becomes longer, and various boundary situations will keep emerging.

This is exactly where most agent platforms are most likely to fail.

Some people asked a more practical question: since there is now the multi-agent capability, can it directly replace the existing workflow tools?

Or, is a system like n8n still necessary?

Essentially, they are concerned about the same thing: n8n is designed to ensure that the process is stable, controllable, and reproducible. To replace it, this multi-agent coordination system must be stable and "reliable" enough.

It's worth noting that Anthropic is also trying to solve this problem in its engineering design.

In the latest technical article, they split the agent system into three independent parts: the model and scheduling logic ("the brain"), the execution environment and tools ("the hands"), and the session log that records the entire process.

The three are connected through interfaces, and if any layer fails, it can be recovered separately without affecting the overall operation.

This design turns the agent from a one-time execution process into a system that can be interrupted, recovered, and even restarted.

In addition, for tasks that need to run for a long time, Anthropic doesn't put all the information into the model's context but records it in an external log and retrieves it when needed, so that the context window won't be filled up.

Similarly, permissions are no longer stored by the model but are isolated separately, so that even if there is an error, sensitive information won't be directly exposed.

However, engineering design can only solve structural problems and can't guarantee the results.

It can be said that people don't doubt what Claude Managed Agents can do, but they doubt whether it can do it stably and controllably all the time.

This needs time to verify.

This article is from the WeChat official account "Zimu AI", author: Yuan Xinyue. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Speed is key. Anthropic completed the production of agents in just a few days.

01 Enter production at 10 times the speed

02 Not just usable, but already in use

03 The tool is good, but the problems are just beginning