One Demo to Rule Them All: Peking University Team Achieves Breakthrough in General Grasping, Compatible with All Dexterous Hand Bodies

Reconstruct the multi-step MDP with sequential decisions into a single-step MDP based on trajectory editing.

In the research of general grasping by dexterous hands, due to the high dimensionality of the action space, the long - range exploration characteristics of the tasks, and the involvement of diverse objects, traditional reinforcement learning (RL) faces challenges such as low exploration efficiency, complex design of reward functions, and complicated training processes.

Based on this, Peking University and the BeingBeyond team proposed the DemoGrasp framework —

A simple and efficient general method for learning dexterous hand grasping.

This method starts with a successful grasping demonstration trajectory. By editing the robot actions in the trajectory to adapt to different objects and postures: changing the wrist pose to determine the "grasping position" and adjusting the finger joint angles to determine the "grasping method".

This core innovation — restructuring the multi - step MDP (Markov Decision Process) of continuous decision - making into a "single - step MDP" based on trajectory editing — effectively improves the learning efficiency of reinforcement learning in grasping tasks and the performance when migrating to real robots.

Core Design: Single Demonstration + Single - Step RL

From "Multi - Step Exploration" to "Global Editing"

The dilemma of traditional RL: Complex exploration in high - dimensional action spaces

Action space: At each step, instructions for all joints of the high - degree - of - freedom robot need to be output.
Reward design: An extremely complex dense reward function needs to be designed to guide the robot to avoid collisions, contact objects, grasp successfully, and move smoothly.
Curriculum learning: A complex multi - stage learning process needs to be designed to assist RL exploration.

The core innovation of DemoGrasp is to replace the "exploration from scratch" with a "single successful demonstration trajectory", transform the high - dimensional grasping task into a "demonstration editing task", then optimize the editing parameters through single - step RL, and finally achieve virtual - to - real transfer through visual imitation learning.

Single Demonstration and Trajectory Editing

A successful trajectory for grasping a specific object contains general patterns of the grasping task (such as "approach the object → close the fingers → lift the wrist"). By simply adjusting the wrist and finger grasping methods in the trajectory, it can be adapted to new unseen objects.

DemoGrasp only needs to collect a single successful grasping demonstration trajectory for one object (such as a cube), and then can generate grasping behaviors for new objects and new positions through trajectory editing centered on the object:

Wrist pose editing: In the object coordinate system, apply a unified transformation T∈SE(3) to each wrist position point in the original trajectory. By flexibly adjusting the wrist grasping direction and position, it can adapt to objects of different sizes, shapes, and suitable grasping points.
Finger joint editing: Apply an increment Δq_G to the grasping joint angles of the fingers. Through proportional interpolation with the demonstration trajectory, generate a motion trajectory for the dexterous hand to smoothly reach the new grasping pose from the initial open pose.

Single - Step Reinforcement Learning

In the simulation environment, DemoGrasp uses IsaacGym to create thousands of parallel worlds, each with different objects and placement scenarios.

Learning process: In each simulation world, the policy network outputs a set of wrist and finger editing parameters based on the initial observations (end - effector pose, object point cloud, and pose), executes the edited trajectory, and receives rewards based on whether the "grasping is successful" and whether a "collision occurs" during the execution process.

Through massive trial - and - error and online reinforcement learning, the policy learns to output appropriate editing parameters based on the observations of objects of different shapes.

Training efficiency: For this single - step MDP problem in a compact action space, DemoGrasp can converge to a success rate of >90% after 24 hours of training on a single RTX 4090 graphics card.

Visual Distillation, Virtual - to - Real Transfer

The reinforcement learning policy in simulation relies on accurate object point clouds and poses, which are difficult to obtain in reality. DemoGrasp distills the policy into an RGB policy aligned with the real robot through visual imitation learning, achieving direct transfer from simulation to the real robot.

Data collection: Run the reinforcement learning policy in simulation and record tens of thousands of successful trajectories, including rendered camera RGB images, robot proprioceptive perception, and joint angle actions at each moment.
Model training: Adopt the method of the Flow - Matching generative model to learn to predict actions from image observations and robot proprioceptive perception. To reduce the visual image difference between simulation and the real robot, a pre - trained ViT is also used to extract image features during training, and domain randomization (randomizing lighting, background, object color and texture, camera parameters, etc.) is fully carried out during simulation data collection.
Multi - modal adaptation: DemoGrasp is adapted to various camera observations such as monocular/binocular and RGB/depth cameras. Experiments show that the combination of a binocular RGB camera has the best effect, which can better reduce occlusion and use information such as texture and contour to successfully grasp small and thin objects.

Experimental Results: Excellent Performance in Both Simulation and Real Robots, Comprehensively Improving the Generalization and Scalability of Dexterous Grasping

DexGraspNet is an authoritative dataset in the field of dexterous grasping (3.4K objects).

When using the Shadow Hand to grasp on this dataset, DemoGrasp significantly outperforms existing methods: The success rate of the visual policy reaches 92%, the generalization gap from the training set to the test set is only 1%, and it can adapt to a wide range of randomization of the initial object positions (50cm×50cm), with stronger spatial generalization ability.

Cross - Robot Expansion: Adaptable to Any Dexterous Hand and Manipulator

Without adjusting any training hyperparameters, DemoGrasp can successfully adapt to six different types of robots (five - finger and four - finger dexterous hands, three - finger grippers, and parallel grippers). After training on 175 objects, it achieves an average success rate of 84.6% on multiple unseen object datasets.

High - Performance Virtual - to - Real Transfer

In real - robot tests, using the Franka manipulator and the InTime dexterous hand, DemoGrasp successfully grasped 110 unseen objects.

For objects of regular sizes, the success rate of DemoGrasp reaches over 90%.

For difficult grasping tasks of flat objects (such as phone cases and scissors) and small objects (such as bottle caps and rubber ducks), the policy can accurately grasp the objects and avoid collisions, with a success rate of 70%.

The DemoGrasp framework supports the expansion ability for more complex grasping tasks in real scenarios, enables language - instructed grasping in cluttered multi - object placement scenarios, and achieves a single - grasp success rate of 84% on real robots. For significant changes in lighting, background, and object placement, the success rate of the policy does not decline significantly.

DemoGrasp is a new starting point for efficient robot reinforcement learning by integrating a small amount of human demonstrations, and will support more dexterous hand tasks such as functional grasping, tool use, and two - hand operations in the future.

The closed - loop ability of the policy during training is a limitation of the current method. Future research will increase the real - time adjustment and error recovery ability of the reinforcement learning policy through more fine - grained splitting of demonstration trajectories.

In addition, DemoGrasp can be combined with multi - modal large models to achieve an autonomous grasping agent in open scenarios.

Project homepage: https://beingbeyond.github.io/DemoGrasp/

Paper: https://arxiv.org/abs/2509.22149

This article is from the WeChat official account "QbitAI". Author: DemoGrasp team. Republished by 36Kr with permission.