HomeArticle

As soon as humans leave their seats, AI evolves. Berkeley open-sources MetaClaw, leaving static agents in a panic.

新智元2026-03-31 08:46
While you're in a meeting, is AI secretly upgrading? Four universities, including the University of California, Berkeley, have open-sourced MetaClaw, enabling agents to continuously evolve while you're in meetings, away from your desk, or asleep, directly shattering the industry's ironclad rule of "frozen upon launch."

It's time for the weekly regular meeting again.

Your computer desktop calendar says "Weekly meeting 14:00 - 15:30", and the screen is locked.

Meanwhile, a background AI process confirms that you won't be back for a while and automatically launches the training window:

The mistakes made in the morning are broken down into rules and injected into the system prompt, and then the cloud LoRA fine-tuning takes over.

After 90 minutes, when you return to your workstation after the meeting, the Agent in front of you has completed a self-iteration.

This is what the open-source MetaClaw framework can achieve:

Allow an Agent that is already in online service to continuously evolve from failures without interrupting the service.

This research breaks the default rule in the Agent industry that "once launched, it is frozen".

The MetaClaw framework is jointly launched by the University of North Carolina at Chapel Hill, Carnegie Mellon University, the University of California, Santa Cruz, and the University of California, Berkeley.

https://arxiv.org/pdf/2603.17187

Once open-sourced, it reaches the top.

As soon as MetaClaw was released, it topped the HuggingFace list. The concept of "continuous evolution of Agent" it represents has attracted great attention from global AI researchers and developers.

What best reflects the maturity of its toolchain is its extremely low deployment threshold.

The console operations shown in the official repository indicate that its massive "dual fast - slow cycle" mechanism and OMLS scheduler have been brutally simplified into just two commands.

Developers only need to enter "metaclaw setup" to complete a one-time configuration, and then enter "metaclaw start --daemon", and the system will be silently launched as a background daemon process.

This out-of-the-box encapsulation completely shatters the barrier between academic models and actual implementation.

Breaking the structural dilemma of Agent's "being frozen once launched"

Currently, the vast majority of Agents face a cruel reality in terms of ability iteration: train once, deploy and launch, and remain unchanged for a long time.

However, the real world is constantly changing: task requirements are shifting, work processes are being modified, and toolchains and organizational rules are also constantly being updated.

On platforms like OpenClaw, an Agent may need to connect to more than 20 messaging channels simultaneously.

The task distribution changes every hour, but the Agent's ability remains at the time of its release.

On the surface, there are already many patch-up solutions in the industry, such as recording trajectories, building static skill libraries, or conducting online reinforcement learning.

However, these solutions often only solve part of the problems:

Storing only the original trajectories without extracting transferable knowledge will result in verbose and fragmented information;

The static skill library and weight optimization are disjointed from each other;

Retraining an Agent usually means that it must be shut down, making it impossible to have both online service and continuous evolution.

This is the real contradiction faced by "static Agents": they must be online 24/7, but the world they face is constantly changing.

An Agent that cannot adapt to the new task distribution, no matter how strong its initial ability is, is likely to seem rigid in long-term practical applications.

Walking on two legs

Fast adaptation and slow evolution

To break the conflict between "non-stop operation" and "continuous evolution", MetaClaw splits the update mechanism into two loops with completely different time scales.

The system architecture diagram of MetaClaw shows its "dual fast - slow cycle" learning mechanism. The left shows how the OMLS scheduler monitors the user's Google Calendar and the idle state of the keyboard and mouse, and the right shows how the system separates the support set and the query set for skill extraction (fast adaptation) and LoRA weight fine-tuning (slow evolution).

The first path is skill-driven fast adaptation.

When an Agent fails in a task, the system will hand over the failure trajectory to another large model for analysis, extract reusable behavior rules, and immediately inject them into the system prompt.

This process does not modify the model weights, does not interrupt the service, and takes effect immediately.

The paper lists typical high-frequency rules: unify the time format, back up before performing high-risk file operations, and strictly follow the naming convention.

More importantly, these rules are not patches bound to a single task but transferable knowledge across tasks.

A single correction regarding the time format can improve the stability of all subsequent tasks involving time processing.

The second path is opportunistic policy optimization.

When the user is inactive, the system will combine the process reward model (PRM) and LoRA to perform gradient-based reinforcement learning (RL) weight updates.

The former is like a quick tactical solution to stop bleeding, while the latter is a strategic solidification of abilities.

To organically combine the two, MetaClaw introduces a core design: separating the support set from the query set and strict skill version control.

If a failed sample has been repaired by the newly extracted rules, using this sample in the reinforcement learning stage will lead to "stale reward contamination": the model will continue to be punished for a problem that has already been solved.

MetaClaw's approach is to mark the trajectories with skill version numbers. After the skill library is upgraded, the invalid samples of the old version are cleaned up, and only the data generated after the new skills take effect are retained for RL training.

This essentially achieves the real unity of "memory" and "evolution".

Training during fragmented time

OMLS scheduler

Model training requires time and computing power. So how does MetaClaw make users hardly aware of it?

The answer lies in its designed opportunistic meta-learning scheduler (OMLS).

OMLS specifically monitors three types of signals: preset sleep periods, the idle state of the keyboard and mouse at the system level, and the schedule occupancy of Google Calendar.

As long as any signal indicating that the user is temporarily away is triggered, the training window will automatically open.

The trainer supports pausing and resuming at any time, which means that even the fragmented time when the user is away for a few minutes can be converted into a time window for continuous AI training.

In the past, AI upgrades were often a centralized project, requiring service shutdown, retraining, version switching, and then relaunching.

MetaClaw turns humans' fragmented idle time into a mini workshop for continuous AI evolution.

In addition, this framework uses a proxy architecture and a cloud training interface, does not require expensive local GPU computing resources, can be directly connected to existing personal Agents and various model platforms, and supports one-click deployment and continuous meta-learning.

Complementing procedural knowledge

Data leap of weak models

The actual effect of this framework has been directly verified in the test data.

The paper team built the MetaClaw-Bench benchmark test, which contains 934 questions, simulating the task flow of 44 working days, and is specifically designed to evaluate whether an Agent can become more capable with continuous use in a continuous task flow.

The test results show that in the case of only injecting behavior rules, the relative accuracy of the evaluated model can be increased by up to 32.2%.

In terms of the end-to-end task completion rate that reflects real execution ability, the evaluated model increased from 2.0% to 16.5%, achieving an 8.25-fold increase.

In another AutoResearchClaw autonomous research pipeline consisting of 23 stages (covering literature review, experimental design, code generation, result analysis, and paper writing), even without weight training and only relying on skill injection, the comprehensive robustness of the system increased by 18.3%, the stage retry rate decreased by 24.8%, and the number of iterative optimization rounds decreased by 40%.

The test data reveals a more crucial phenomenon: MetaClaw is first and foremost a framework for continuous Agent evolution, and it has a particularly obvious gain for Agents driven by weak base models.

The paper analysis points out that weaker models lack implicit procedural knowledge: that is, those specific operation rules, execution habits, and format disciplines, and the skill library explicitly writes out this knowledge. Therefore, just by skill injection, a greater increase in accuracy can be achieved.

In contrast, due to its higher starting point, GPT - 5.2 has less room for improvement and is more likely to encounter the ceiling effect.

However, the paper also emphasizes that skill injection mainly improves rule compliance and partial execution quality and is not enough to stably unlock the end-to-end completion rate in high-intensity tasks.

What really enables the evaluated model to achieve an 8.25-fold increase is the complete MetaClaw framework after combining skills and weight-level policy optimization.

Paradigm shift in the era of Agent evolution

Of course, MetaClaw still has certain limitations.

The paper team points out that the current benchmark tests are conducted in a simulated environment, which is not completely equivalent to a complex production environment; the detection of idle windows also depends on specific user system configurations.

However, MetaClaw clearly points to a direction of paradigm shift: the lifecycle of Agents is evolving from "delivered after training" to "continuing to grow after delivery".

The continuous updates of its GitHub repository (including engineering progress such as proxy access, multi-client support, and cross-session memory) indicate that this concept is rapidly transforming into a usable toolchain.

Putting it back into the industry context, its significance is even greater.

Compared with the OpenClaw-RL proposed by the Princeton team recently (which tends to directly use all interaction signals for training), MetaClaw chooses a hierarchical strategy of "fast rules plus slow weights".

The former pursues immediate correction, while the latter pursues long-term solidification. The two represent different engineering considerations for the evolution path of the next-generation Agents.

What will determine the upper limit of the future model's ability is no longer just the parameter scale at the time of release, but also the closed-loop mechanism for continuously transforming experience and self-iterating in real usage scenarios.

Your calendar, the state of your keyboard and mouse, and every time you leave your seat may become an opportunity for the next AI ability upgrade.

The real intelligent evolution has just begun at the work site.

Reference materials:

https://arxiv.org/abs/2603.17187

https://github.com/aiming-lab/MetaClaw

This article is from the WeChat public account “New Intelligence Yuan”. Author: New Intelligence Yuan, Editor: Yuan Yu. Republished by 36Kr with authorization.