Claude Code Major Leak: End Cloning, Open - Source the Most Cutting

The best learning materials for Agent Harness

On March 31st, Anthropic probably didn't expect to give the developer community a gift in this way.

Security researcher Chaofan Shou discovered that there was a.map file - sourcemap hidden in the npm package of Claude Code. This file is used for developers to debug and should have been removed from the production environment long ago. The fact that it wasn't removed means that anyone can restore the complete source code of Claude Code from this file.

There are 1,906 files, 512,000 lines of code, more than 40 tools, and 85 slash commands. Within a few hours, the code was mirrored to GitHub and received thousands of Stars and Forks.

What's even more ironic is that there is a subsystem called "Undercover Mode" in the code, which is specifically designed to prevent Anthropic's internal code names from appearing in git commits to avoid information leakage. They carefully designed an anti - leakage mechanism and then packaged the entire source code into npm.

But what this article wants to discuss is not this mistake, but the truly valuable thing in this code: what a production - grade agent harness looks like.

How high is the quality of Claude Code?

The agent harness is a new thing being explored across the entire industry. Anthropic has repeatedly stated a view in the past year: The model itself is just the engine, and the harness is the whole vehicle. They have proposed a series of design principles such as context engineering, minimal viable tool sets, and sub - agent isolation.

Now that the source code is out in the open, we can finally see if they follow the technical documentation they wrote.

The answer is that they not only followed it but also had a hidden ace up their sleeve.

The following evaluation and description of the harness are mainly based on Anthropic's technical documentation.

Storage layer: Context engineering and memory

Context costs money

The longer the context, the more likely the model is to get lost. This problem is called context rot in the industry. More context is not always better. For each additional token, the attention allocated to all other tokens decreases, and important information gets diluted.

Claude Code sets a hard limit for each piece of content to forcibly control what can enter the window and how much.

The skill list can take up at most 1% of the entire window, and each description should not exceed 250 characters:

// verbose whenToUse strings waste turn - 1 cache_creation tokens

// without improving match rate.

export const MAX_LISTING_DESC_CHARS = 250

Why set such a limit? Because the purpose of the skill list is to let the model know "there is this tool", not to let the model understand how to use it. Finding tools relies on keyword matching. There is no difference in the matching rate between a 500 - word description and a 50 - word description. The extra words are all a waste.

The system prompt is divided into two halves. The first half contains instructions shared by all users, with fixed content that can be cached and reused directly in the next call. The second half contains content specific to this user and this session, which is generated dynamically each time. In this way, each API call only needs to process the changing half, saving a large amount of redundant calculations.

The results of large tool calls are not put into the context but are directly written to the disk, and the model is given a file path. For example, if a command is executed and returns thousands of lines of logs, stuffing these thousands of lines into the context will squeeze out other useful information. Using a file reference only takes up one line.

The essence of a production system is to handle failures

When the model approaches the limit of the context window, it tends to end prematurely. Anthropic calls this "context anxiety". It's like a person who knows they are running out of time and starts to skip steps and take shortcuts. The solution is to completely clear the window, organize the current progress into structured handover materials, and start the next window from the handover materials instead of starting from a nearly full context.

Claude Code implements this as a three - level compression: first try a lightweight summary, if it's not enough, perform automatic compression, and if that's still not enough, perform forced compression after the API reports an error. Only move to the next level if the previous one fails.

Above the three levels, there is also a circuit breaker:

// BQ 2026 - 03 - 10: 1,279 sessions had 50+ consecutive failures (up to 3,272)

// in a single session, wasting ~250K API calls/day globally.

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3

Someone running data in BigQuery found that sessions with consecutive automatic compression failures wasted about 250,000 API calls per day. The reason for compression failure is usually that the context is already corrupted, and continuing is completely meaningless. So a rule was added: stop after 3 consecutive failures and don't try again.

A demo only needs to work. A production system also needs to know how to cut losses when it fails.

Memory is not about storing everything

For an agent across context windows, every time a new window is opened, it forgets everything that happened before. Anthropic compares this to shift engineers. Each shift of people comes in not knowing what the previous shift did and has to start from scratch.

Claude Code uses a background sub - agent to regularly extract and store key information from the conversation and inject it back when needed. But it doesn't store or inject everything. Memory screening is done using Sonnet to determine which memories are relevant to the current task.

There is a detail: the reference documentation of a tool that has just been used will not be injected:

async function selectRelevantMemories(

query: string,

memories: MemoryHeader[],

recentTools: readonly string[],// Filter out the documentation of tools that have just been used

): Promise

The model has just used this tool, and there is already a usage record in the context. Injecting the documentation at this time is just redundant information and takes up space without additional value.

Network layer: Tool access

The boundary of tools is the boundary of the agent's capabilities

If there are too many tools, the model doesn't know which one to use, which is the same as having no tools at all. This is a well - recognized pitfall in agent design. Tool selection itself consumes the model's reasoning ability, and the more options there are, the more likely it is to make the wrong choice.

Claude Code has more than 50 tools. The solution is that most tools don't appear in the context initially. The model can get the full definition only when it actively searches:

export function isDeferredTool(tool: Tool): boolean{

if (tool.alwaysLoad === true) return false

if (tool.isMcp === true) return true// MCP tools are loaded deferred by default

if (tool.name === TOOL_SEARCH_TOOL_NAME) return false// ToolSearch is never loaded deferred

}

ToolSearch is always fully loaded because the model relies on it to find other tools. If ToolSearch is also loaded deferred, the model won't be able to find anything.

Each tool also comes with a set of attributes: whether it can be executed in parallel, whether it will modify the file system, the size threshold for writing results to disk, and whether to stop or continue when the user interrupts.

These attributes are not just documentation for humans. They are the basis for the scheduling engine to decide how to execute the tool.

Permission pop - ups can be eliminated in advance

Tool calls need to pass through five checkpoints to be executed: input verification, permission logic, rule matching, hook interception, and classifier or user confirmation. Each checkpoint may stop the process, but they all take time.

The slowest part is waiting for the user to confirm. To eliminate this waiting, the classifier starts running before the pop - up window even appears:

If the classifier determines that "this command is probably okay", the pop - up window is skipped directly. The user doesn't notice the waiting because the judgment is already done during the preparation of the pop - up window.

Container layer: Sub - agent design

Sub - agents are not nested dolls but context - isolated

The value of sub - agents is that they can be discarded after use. Spend tens of thousands of tokens to complete a subtask, and only hand over the conclusion to the main agent after completion, discarding the entire intermediate process. The main agent's context only contains the conclusion, not the whole process.

There are four execution modes: synchronous (the main agent waits for the sub - agent to finish), asynchronous background (the sub - agent runs in the background and notifies the user upon completion), Worktree file system isolation (the sub - agent modifies code in an independent git copy without affecting the main directory and then merges the changes), and cross - machine isolation (runs completely on another machine). The higher the risk, the more thorough the isolation.

The tools used by sub - agents are filtered, and they cannot recursively call themselves to prevent infinite nesting.

The connections established by the sub - agent are closed by itself when it exits. If it reuses the existing connections of the parent agent, it cannot close them because the parent agent is still using them.

A new Hook system has been added

The hook system turns the harness into a platform

Claude Code has opened 27 event nodes. At any critical moment during the agent's operation, users can intervene and do something:

What can be done when intervening? It's not just interception but also modification. Before a tool is executed, the input parameters can be changed. For example, if the agent wants to write a file, the hook can quietly change the path to the sandbox directory. The agent doesn't know and continues to execute as usual, but the file is saved in a safe place.

There are two ways to write hooks. One is to run a script. If the exit code is 0, the process is allowed to continue; if it's 2, the process is intercepted. The logic is written by the user. The other way is to let Haiku make the judgment. Provide it with a description and criteria, and let the model decide whether to allow the process to continue, waiting for 30 seconds.

The result of this design is that security policies, audit logs, and enterprise compliance requirements can all be integrated from the outside without modifying Claude Code itself. Different companies have different security regulations. In the past, they had to fork the code and make modifications themselves. Now, they just need to write a few hooks. The harness has changed from a fixed product to a customizable base.

Conclusion

The above seven points are the parts that are easiest to explain from the source code.

There are many other equally interesting things hidden in it: the precise reconstruction order of compressed messages, the handling of state competition in tool concurrency partitions, the different security bypass logics for Zsh and PowerShell in Bash commands, the lifecycle management of the sub - agent's MCP server... There is a real pitfall behind each detail.

If you want to understand how a production - grade harness actually works, just read the code.

512,000 lines of code are solving one problem: how to make a fallible language model stably complete an engineering task that requires many steps.

This problem is more difficult to answer and more valuable than "which model is smarter".

In the past two years, the industry has attributed the failure of agents to the insufficient capabilities of the models. But Claude Code believes that the models are already sufficient, and what is lacking is the carefully crafted harness. Context quotas, compression and circuit - breaking, tool deferred loading, sub - agent isolation, hook platforms... These are not Anthropic's exclusive secrets. They are problems that any team wanting to make a stable agent will have to solve sooner or later.

Anthropic first provided a solution and then unexpectedly made the answer public. This may be the most worthy part to study in this accidental leak.

This article is from the WeChat official account “Silicon Star People Pro”, author: Dong Daoli. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Claude Code major leak: Stop just cloning. The most cutting-edge Harness is now open-sourced.