Hundreds of Agents: How to Manage Them? A New Approach from Tsinghua Team

Tsinghua University and Sun Yat-sen University Open Source OpenRath: A Session-Centric Multi-Agent Framework

As the number of Agents increases, management chaos becomes a difficult problem. OpenRath proposes using Session as the core, replacing the Agent-centered design, so that multiple Agents can share states and achieve clearer collaboration and control.

As the number of Agents grows, Sessions become increasingly chaotic.

This is a wall that almost everyone hits after truly scaling up a multi-agent system.

One Agent maintains a context, and another Agent duplicates a history; a task branches out into several inference paths, and in the end, no one can tell which branch produced the final answer; model calls, tool executions, sandbox environments, and long-term memories each manage their own states - the demo runs well, but when the system scales to dozens or hundreds of Agents, debugging, reproduction, and orchestration all start to get out of control.

Recently, a team from Tsinghua University and Sun Yat-sen University (Rath Team) open-sourced their solution, called OpenRath: It is a multi-agent, multi-session runtime like PyTorch.

Its claim is: Stop revolving around Agents. What really deserves to be treated as a first-class citizen is the Session.

Official website: https://www.openrath.com/

Documentation: https://docs.openrath.com/

Blog: https://blog.openrath.com/

GitHub: https://github.com/Rath-Team/OpenRath

Open-source license: BSD-3-Clause | Current version: v1.2.1 (PyPI)

Currently, OpenRath has been released to v1.2.1 on PyPI. You can install it with pip install openrath. It uses the BSD-3-Clause license, and the official website, documentation, blog, and GitHub are all available.

This article will start from "why" and go all the way to "how to use", focusing on the differences between OpenRath and frameworks like AutoGen and LangGraph - and why it dares to borrow the name of PyTorch.

Agents use "chat history" for thinking at runtime

The fundamental question: When an Agent actually takes action, where should the evidence be stored?

The first-generation large model applications can be summarized as "input prompt, output answer". The Agent system has changed this boundary.

A useful Agent does more than just produce text. It retrieves information, plans, calls tools, reads files, writes code, queries APIs, runs tests, operates browsers, and sometimes modifies external states. ReAct alternates reasoning and action in a loop, Toolformer enables the model to learn when to call tools, and the Model Context Protocol turns tools into protocol-level boundaries. This line has been moving forward.

But once an Agent actually takes action in the world, a runtime-level question arises: Where exactly should the evidence of these actions be stored?

If a tool call reads a file, we need its parameters and results; if it modifies a repository, we need the diff; if it runs in a sandbox, we need the identity of the sandbox; if it fails and retries, we need the failed path; if someone approves or rejects an action, we need the verification signal. A chat history can at most describe these things, but it is not enough to restore them.

Let's take a specific example.

A software task: The research Agent reads the issue and retrieves notes; the coding Agent modifies the repository; the sandbox runs tests; the verification Agent rejects the first version of the patch, so the workflow branches; the memory backend records this failure to avoid repeating it in the future. If these events are scattered in their respective logs, then the final answer is almost the least important product. What is truly valuable is the evidence chain of "how the work progresses step by step".

This is the starting point of OpenRath: Treat the Session as the carrier of evidence, not just the chat history.

Why an Agent Cluster?

A single Agent will expand into a huge prompt, so it needs to be split.

Early on, a single Agent was basically sufficient: it received input, understood the task, called tools, and returned results, like an enhanced chatbot. But real-world tasks quickly exceeded the capabilities of a single Agent.

A decent software engineering task often needs to be split into requirement understanding, data retrieval, architecture design, code implementation, test verification, and result review. Different stages require different capabilities - some are good at planning, some at coding, and some at error detection. If a single Agent continues to handle everything, it will expand into a huge prompt and an increasingly chaotic context window.

Thus, there is the Agent Cluster: Let the Planner, Researcher, Coder, Reviewer, Executor, and Memory Agent each perform their own duties and collaborate around a complex goal.

Multiple specialized Agents collaborate around a shared Session: each reads the current state, completes a local task, and writes the result back for the next Agent to take over.

But once it is actually run, a difficult problem arises: How do these Agents share the context? Which Agent, which branch, and which tool call did a certain conclusion come from? If an Agent makes a mistake, can we roll back to the corresponding branch and start over?

To put it simply, the real challenge of an Agent Cluster has never been "creating more Agents" - the difficult part is managing how the states flow between these Agents.

OpenRath asks one more question

While others focus on how Agents communicate, it asks who owns the work after the communication.

The term multi-agent often makes people think of a group chat: one Agent proposes, one criticizes, one executes, and one supervisor decides when to end. This model is useful, but not enough.

There has been a lot of work in this area: AutoGen turns multi-Agent conversations into a practical programming model; CrewAI separates Agent teams from more structured processes; LangGraph uses graph states and supervisor nodes to express routing and control. They all solve the problem of how Agents communicate.

OpenRath then asks one more question: After the Agents finish communicating, who owns the state of this work?

A production-level Agent Cluster needs to decide: which Agent should be assigned the current Session, what context it should see, which memories it has read, in which sandbox the next command should run, and what verification signals are needed before continuing. These are all control plane problems that cannot be solved by adding another role to the group chat. OpenRath's answer is: Make the Session the unit of routing and the Session Graph the control plane - Agents, tools, workflows, memories, and sandbox locations all converge on this graph.

In a nutshell: An Agent cluster is not a group chat, but a runtime control plane built on persistent Session states.

This is also why, from the two dimensions of number of Agents × number of Sessions, multi-agent systems can be divided into four quadrants:

A single Agent with a single Session is like a ChatGPT-style chat; multiple Agents with a single Session is sub-agent collaboration; a single Agent with multiple Sessions is like OpenClaw's branch fan-out; and multiple Agents with multiple Sessions (MAMS) is the direction that OpenRath is targeting.

OpenRath calls this approach MAMS (Multi-Agent Multi-Session). It makes a clear judgment: What really needs to be forked, merged, reused, and tracked is the entire Session data stream - rather than the message lists maintained within each Agent.

In other words: Most frameworks gather a room full of smart workers, while OpenRath first builds the workstations, work orders, and assembly lines. As the official puts it - Agents are the workers, and Sessions are the work itself.

Build an Agent cluster like PyTorch

This is not just borrowing the name.

OpenRath has adopted the three design features that make PyTorch so useful.

The smartest move of OpenRath is to apply the set of abstractions that deep learning developers are most familiar with to the Agent system.

Why is PyTorch so useful? Because it breaks down complex computations into clear building blocks: Tensors are the flowing data, Modules/Layers are the combinable units that transform this data, the device determines where the computation takes place, and the entire computation graph grows during runtime. OpenRath has made an almost one-to-one mapping for the Agent system:

Core mapping: Tensor → Session, Module/Linear → Workflow/Agent, Device → Sandbox / Backend, Parameter → Memory, Function → Tool, Control flow → Selector.

The following three sections explain the three most crucial sets of correspondences in this table, from terminology comparison to why such a design.

This mapping is not just a gimmick. Breaking it down, what PyTorch really teaches OpenRath are three things - the following three sections are exactly the three pillars for understanding OpenRath.

Pillar 1: Agents are transformation layers, not all-round assistants

Layers do not hold data; Agents do not hold states.

In PyTorch, nn.Linear is not an application; it is just a transformation layer: it takes in a Tensor and outputs a Tensor. The capabilities of a network come from stacking many such layers.

OpenRath designs Agents in the same way. An Agent is a transformation layer on the Session. Its core is a path of forward(session) -> session: it takes in a Session and outputs a Session.

The key is that there is more than one type of transformation layer. With the same shape of forward(session) -> session, it can accommodate completely different tasks:

An Agent calls tools, modifies files in the workspace, and writes the execution results back to the Session;
A Compressor compresses a long session that has run for dozens of rounds into a concise message (Lesson 8 of the official example is about this);
An Agent recalls memories before running and commits memories after running, which is equivalent to "indexing and archiving" this session;
You can also write an Agent that only does summarization, verification, or rewriting.

They all have the same external interface, so they can be arbitrarily stacked and nested like the layers of a neural network. This is the meaning of Workflow (corresponding to nn.Module): As long as a subclass implements forward(session) -> session, it can connect multiple Agents, fork Sessions, compress the context, call tools, and distribute to sub-workflows. Since the input and output of each layer are Sessions, Workflows can be nested layer by layer like nn.Modules, and each layer does not need to reinvent a state format.

Managing hundreds of Agents thus changes from piecing together prompts to building modules. Layers do not hold data; data is Tensors. Agents do not hold states; states are Sessions.

There is also an easily overlooked benefit here. Since an Agent does not own the entire world - the Session loop is still the engine, the Sandbox is still the execution location, and the Memory is still an independent store - the scenario of a single Agent is simple enough, and the same Agent can be directly plugged into a larger Workflow without changing a single line of code.

As for the tools themselves, OpenRath abstracts them into FlowToolCall: it holds the name, description, and JSON schema for the model on one hand, and the actual behavior executed in Python on the other hand, so that what the tool looks like and what the tool does are always together. It has built-in file, shell, and code execution tools, and the MCP tools of stdio can also be directly adapted to the same loop. There is also a clear hierarchical structure at the bottom: FlowToolCall is the function visible to the model at the flow layer, and BackendTool* is the payload actually consumed by the sandbox backend.

Pillar 2: Sandbox and Memory are "pluggable backends"

Hardcoding the backend is like welding the model to the CPU.

The second smart thing about PyTorch is that it separates "where to compute" from "what to compute". For the same model code, you can use .to("cuda") to run it on the GPU, and changing the backend is like changing a graphics card without modifying a single line of the computation logic. The device/compute backend is pluggable.

OpenRath applies this idea to the two places that are most likely to be hardcoded: the execution environment and long-term memory.

Sandbox (corresponding to Device) - Where the tools actually run. Many frameworks manage "chat history" and "the actual execution location of tools" separately. The model thinks it is still in a certain workspace, but the shell or container has actually switched.

OpenRath binds the Sandbox to the Session: Tools run on the current backend of the Session, and the returned Session will remember its execution location and will not drift silently.

Its real ingenuity is making the Sandbox a pluggable backend: The local process is always available (session.to("local", spec="./")), and the containerized OpenSandbox is an option (pip install "openrath[opensandbox]"). In the future, any third-party execution backend can be used as long as it is connected to the same set of Session placement models. The execution environment is no longer hardcoded into a single shell.

Memory (corresponding to Parameter) - Memory retained across runs.

It is an independent layer of persistent state that can be bound to an Agent, recalled before running, and committed after running; it is neither discarded after use like tool results nor just a few lines of text stuffed into the prompt.

The basic installation comes with a zero-dependency local backend