HomeArticle

The NVIDIA team let the programming Agent take over real robot experiments, achieving a 99% success rate.

机器之心2026-06-18 08:22
Automated research is realized in the physical world for the first time.

Automation research has truly stepped out of the code sandbox and entered the real physical world this time.

Recently, Jim Fan, the head of NVIDIA GEAR Lab, introduced a latest project called ENPIRE. This is their first time to implement automation research on robot hardware.

They placed 8 Codex Agents in a robot fleet, allocated GPU computing power and a sufficient token budget, and only gave a simple goal: solve the task as soon as possible, keep the robots busy but ensure safety, and don't waste computing power.

After that, humans basically withdrew from intervention. The Agents autonomously drive the entire closed - loop, including automatically resetting the scenario, searching the literature, implementing ideas and building infrastructure, training and deploying strategies, self - verification, analyzing logs and modifying code, and iterating continuously until high - precision dexterous tasks are reliably completed on real hardware, such as tying cable ties, organizing pin boxes, and installing GPUs.

They also observed a "physical scaling law". Increasing the number of parallel robots (for example, from a small number to 8) can significantly speed up the task - solving process.

Currently, some systems in the lab have achieved self - iteration without human intervention overnight. Researchers only need to check the reports in the morning.

Jim Fan said that the future goal is to allow team members to take a rest with peace of mind, and even NVIDIA CEO Jensen Huang won't notice that the lab is still running autonomously.

The ENPIRE project is planned to be fully open - sourced. By then, ordinary developers are also expected to build a similar autonomous robot research system at home.

Project address: https://research.nvidia.com/labs/gear/enpire/

ENPIRE System Architecture: A Closed - Loop Composed of Four Modules

ENPIRE is a framework system designed for coding Agents. It builds a repeatable physical feedback loop through four core modules: The Environment Module (EN) is responsible for automatic reset and verification. The Policy Improvement Module (PI) initiates policy optimization. The Rollout Module (R) supports the parallel evaluation of policies by single or multiple robots. The Evolution Module (E) enables the coding Agents to analyze logs, consult literature, improve training infrastructure and algorithm code to solve failure modes.

This closed - loop system transforms real - world robot learning into a controllable optimization process managed by Agents, thereby minimizing manual input and supporting fair ablation experiments between different training recipes and Agent variants.

With the support of ENPIRE, cutting - edge programming Agents can independently develop strategies and achieve a 99% success rate in challenging real - world dexterous operation tasks such as PushT, organizing pins into pin boxes, and using cutters to cut cable ties.

Key Discovery: Resetting the Environment Is Easier Than Completing the Task Itself

One of the key observations is that for many robot tasks, resetting the environment is often easier than completing the task itself.

Therefore, ENPIRE's approach is to first let the Agents build an automatic environment reset through Code - as - Policy. In many cases, the so - called reset is actually a pick - and - place task, which can be solved by Cap - X.

Subsequently, the Agents will write a heuristic - based reward function. The research team then puts the environment into the sandbox and starts the Agents to conduct automation research around the scores.

This also echoes Karpathy's definition of automation research: The automation research mentioned here is not simply adjusting a hyperparameter or modifying a small piece of code. The Agents will explore different paradigms from the Internet and rewrite all parts that may promote performance improvement, including algorithms, training goals, and even data loaders.

In the pin - inserting task, an Agent even wrote a contact - force safety controller by itself, and its effect exceeded simply adjusting several reinforcement learning parameters.

New Metrics: MRU and MTU

The scalability of ENPIRE depends on the size of the Agent team and computing resources. However, here, the truly scarce resource is not the GPU, but the robot time.

When the research team provided 8 robots for the Agents instead of 1, the time required for the pin - inserting task to reach near - perfect performance was shortened from more than 1.5 hours to about 40 minutes. These Agents coordinate through Git: sharing code, abandoning unpromising ideas, and autonomously selecting each other's best running results.

This points to a greater change: Robot research is becoming an environmental design job, that is, building an environment for coding Agents to conduct automation research; the algorithm work has moved up to a higher level, turning to building a feedback loop that Agents can close by themselves.

And this loop will continuously accumulate in a compounding way: A skill mastered by an Agent today will become the basic module for building and resetting a more difficult task environment tomorrow. Capabilities will bootstrap new capabilities.

In this paradigm, the real hard constraint is the real - world interaction budget.

Therefore, the research team proposed two metrics:

  • Mean Robot Utilization (MRU): The ratio of the actual time of robot running experiments to the total real - time consumption.
  • Mean Token Utilization (MTU): A measure of the efficiency of Agents in converting tokens into research progress.

In their experiments, the MRU was always below 50%. That is to say, the robots were idle for half of the time, waiting for the Agents to think. Therefore, better harnesses and faster models will directly translate into actual benefits.

PushT is a long - standing robot operation benchmark. Usually, to complete this task, a large amount of human demonstration data is required, plus several hours of behavior cloning training.

However, they found that Codex, Claude Code, and Kimi Code all "solved" this task in less than 2 hours using a rule - based heuristic method: without using neural networks, without training, and without relying on any human data.

To enable more people to try automation research in the physical world at home, they developed a full - stack system based on @LeRobotHF's SO - 101 kit + NVIDIA Jetson Thor. This system can complete the PushT task.

Reference Links:

https://x.com/_wenlixiao/status/2066913334994358342

https://x.com/DrJimFan/status/2066921736369766762

This article is from the WeChat official account "MachineHeart" (ID: almosthuman2014), written by Yang Wen, and published by 36Kr with authorization.