The dexterous hand can help girlfriends open bottle caps now. New achievements from Tongji University, Tsinghua University, Shanghai Jiao Tong University, etc.
Dexterous hand skills +1! Now you can help your girlfriend open bottle caps!
Moreover, it can also help squeeze toothpaste and plug in chargers.
Research teams from Tongji University, Tsinghua University, Shanghai Jiao Tong University, the University of Hong Kong, etc. have proposed a new method, the KineDex framework, for teaching and strategy learning in dexterous manipulation tasks -
In a real "hand - in - hand" guidance mode, human movements are directly transmitted to the dexterous hand, and high - fidelity tactile information is collected synchronously.
As a result, the Star Motion Era Dexterous Hand Star Motion XHAND 1 has successfully unlocked various complex and delicate operations.
In nine complex tasks such as bottle cap tightening, toothpaste squeezing, and syringe pressing, the average success rate of KineDex reaches 74.4%, and the data collection efficiency is more than twice that of teleoperation.
This paper has been accepted by CoRL 2025.
Real "Hand - in - Hand" Guidance for Dexterous Manipulation Learning
Currently, the difficulty in robots learning fine manipulation (especially tasks that require precise force control) lies in the lack of high - quality "demonstration data".
There are two mainstream methods in the past. One is teleoperation, and the other is video learning. In the former, the operator lacks real "hand - feeling", with low efficiency and high probability of failure; the latter learns by imitating human videos, but there are differences between humans and dexterous hands, resulting in mismatched movements, and there is also no tactile information.
Generally speaking, it is difficult for these two methods to collect data containing high - fidelity tactile and force information to train robots.
Against this background, the team proposed the KineDex solution, and its core idea is very intuitive: hand - in - hand teaching.
In terms of hardware configuration, it includes a robotic arm equipped with a dexterous hand. The team uses two RGB cameras to collect visual observation data: one is fixed in front of the workbench to provide a global view of the scene, and the other is installed on the wrist of the end - effector to achieve close - range perception of the operation area.
First, collect data. The core design concept of the KineDex data collection system is to allow the operator to "wear" the dexterous hand and move freely, and perform operation tasks that require fine contact in real - time. To achieve this hand - in - hand control, the team installed a ring - shaped strap on the back of four fingers (non - thumb) of the dexterous hand.
In this way, it can ensure that the contact force generated during the movement can be transmitted to the operator's hand in real - time, providing natural tactile feedback throughout the teaching process.
Each demonstration will record data information including visual observation, proprioception (the pose of the robotic arm's end - effector and the joint positions of the dexterous hand), tactile sensing, and fingertip force.
Next, process the data. The data collected by the system cannot be directly used for visual motion strategy learning because the camera will definitely capture the operator's hand, which will interfere with the robot's learning. When the robot operates by itself later, there will be no human hand. Therefore, if such data is directly used for training, it will lead to a significant out - of - distribution shift.
Therefore, the team uses image inpainting technology to remove the operator's body parts from the visual observation.
For the original kinesthetic teaching data, first apply Grounded - SAM to extract the mask of the operator's body parts from the video frames, and then input the frame sequence and its corresponding mask into the ProPainter model to repair the areas occluded by the human body.
Finally, the learned strategy receives visual and tactile information as input, predicts joint positions and contact forces, and executes through force control to achieve robust operation.
No Problem with Squeezing Toothpaste, Placing Eggs, and Opening Bottle Caps
The team designed nine tasks, focusing on fine force control, multi - finger coordination, and the ability to interact with daily objects, to verify the effectiveness of this strategy.
These tasks cover a variety of dexterous manipulation skills, including challenging scenarios: such as squeezing toothpaste onto a toothbrush (requiring continuous fine - tuning of pressure), pressing a syringe (requiring stable one - hand force application and coordinated grasping to prevent slipping or misalignment).
This experiment uses a Franka Emika Panda robotic arm equipped with the Star Motion Era Dexterous Hand Star Motion XHAND 1. Each finger of XHAND 1 has two joints, and the thumb and index finger additionally include a rotating joint, forming a total of 12 degrees of freedom. Each finger is equipped with 120 tactile sensing points.
The team compared KineDex with three ablation variants:
(1) Force - control - free version: Disable the force - control module during the inference phase while keeping the training settings unchanged;
(2) Tactile - input - free version: Remove the tactile sensing data from the strategy input during training, but the strategy still predicts the target fingertip force and uses the same force - control strategy for execution;
(3) Inpainting - free version: Omit the image inpainting pre - processing step.
For each task, the team conducted 20 trials to evaluate the performance.
The success rate of KineDex exceeds 70% in most tasks, and reaches nearly 100% in common pick - and - place scenarios such as bottle grasping and cup grasping.
Although the performance slightly declined in the last three more challenging tasks, the average success rate still exceeded 50%. This decline may be due to the higher requirements of the tasks for fine positioning and contact reasoning, exceeding the representational ability of the current strategy input.
Nevertheless, the experimental results also prove the effectiveness of KinDex in learning daily dexterous manipulation strategies, which benefits from its natural fit with human behavior and the availability of precise tactile/force feedback.
The results of the ablation experiment show that the absence of the force - control module will significantly affect the system performance. When this module is disabled, the average success rate of all tasks drops sharply to 16.7%, and even simple tasks such as bottle grasping are difficult to complete. A dexterous hand without force control often only touches the surface of the object without applying enough pressure, resulting in frequent failures in contact - intensive tasks.
In tasks that are particularly dependent on contact (such as opening bottle caps, squeezing toothpaste, and pressing syringes), the absence of tactile input leads to a significant deterioration in performance, with the average success rate decreasing by 26.7%.
If the human hand is not removed from the picture without image inpainting, the success rate of all tasks will be 0, and abnormal behaviors will occur during the execution.
Subsequently, the team further verified the advantage of KineDex in data collection compared with teleoperation through a comparative experiment.
The results show that the success rate of data collection using KineDex is close to 100%, while that of teleoperation is only 39%. This indicates that teleoperation requires higher operating skills and repeated trial - and - error to generate high - quality teaching data, resulting in significantly lower data collection efficiency than KineDex.
In terms of efficiency, the data collection speed of KineDex is more than twice as fast. In the complex syringe - pressing task, the time taken for a single demonstration by KineDex is only 50% of that of teleoperation; in the simple bottle - grasping task, the time taken is less than one - third of that of teleoperation.
User research also shows that people find the hand - in - hand teaching method of KineDex more intuitive, efficient, and easier to collect data for complex tasks.
Project Link:
https://dinomini00.github.io/KineDex/
Paper Link:
https://arxiv.org/abs/2505.01974
This article is from the WeChat official account "QbitAI". Author: Focus on cutting - edge technology. Republished by 36Kr with authorization.