How difficult is it for a robot to open a Coke and deal cards? Let's talk about the hardware and algorithms of dexterous hands.
In 2026, humanoid robots will enter the era of large-scale mass production. The clearest signal comes from Tesla. The "Golden Optimus" Optimus Gen 3 is expected to make its debut in the first quarter of 2026, and a production line with an annual capacity of up to 1 million units is planned to be built by the end of the year. Elon Musk has repeatedly stated that approximately 80% of Tesla's future value will come from this, rather than from automobiles. The key to the development of "Optimus" lies in its "hands and forearms."
Dexterous hands can be divided into three major driving paths, each with its own representative:
[Linkage Solution] It fully demonstrates the beauty of mechanical design but usually means low degrees of freedom. However, there are also dexterous hands with high degrees of freedom, represented by South Korea's ILDA.
[Cable-Driven Solution] The advantages are lightweight, high degrees of freedom, and stable force output. Tesla's Optimus and the TetherIA dexterous hand that demonstrated opening a soda can at the Silicon Valley 101 Alignment Technology Conference are both representatives of the "unidirectional cable" in the cable-driven solution. The representatives of the "bidirectional cable" include the Shadow Robot and the ORCA Hand, which are known as the "pearls on the crown of dexterous hands." However, this type of solution also faces problems such as cable routing extension, material creep, and high assembly difficulty.
[Direct-Drive Solution] The advantage lies in convenient fine control. At industry exhibitions, the Sharpa robot amazed everyone with its motor direct-drive dexterous hands. It dealt playing cards one by one and pressed the shutter of a camera. The disadvantages are poor impact resistance and relatively large weight.
In this episode of "Silicon Valley 101," Hong Jun invited two experts in the field of dexterous hand model algorithms and hardware - Qi Haozhi, a former Meta robotics research scientist and currently at Amazon, and Tao Yiwei, the co-founder of TetherIA, to discuss the current situation of dexterous hands, the characteristics of different technical routes, and the challenges they face in terms of data and algorithms.
The guests believe that for leading dexterous hand companies, it is not difficult to create a successful demo for a single task in the short term. The real breakthrough lies in universality and scalability. When the algorithm enables the robot to learn diverse dexterous operations such as opening a soda can, opening a door, and screwing a screw in a short period, it means that the dexterous hand has reached a generalization breakthrough stage similar to that of ChatGPT.
01 Abilities and Challenges of Robot Dexterous Hands
Hong Jun: The audience may be most confused. In many demos, we can see robots picking up a vacuum cleaner, taking out the trash, boiling water, and even putting plates in the dishwasher. I remember at one of Tesla's press conferences, Optimus poured wine on the spot. It seems that this hand is already very intelligent. So, could you summarize what scenarios the robot's hand can handle now and what level of development it has reached?
Qi Haozhi: Sure. I think that in the case of teleoperation, if the fingers of the hand do not require very precise movements, it is a relatively simple problem. For example, when Optimus pours wine, it just places its hand on the handle of the wine dispenser and presses down. This is relatively simple in terms of control.
In contrast, if we want a robot to use various tools in our homes, such as screwdrivers and scissors, our fingers need more precise and dexterous operations. In this case, the operation becomes very difficult. Moreover, it needs to adapt to different tools in thousands of households, which exponentially increases the difficulty.
Optimus demonstrates pouring wine. Image source: X
Hong Jun: You just mentioned the term "fine movement," and I understand it as its "generalization ability," which means putting it in different scenarios. These two aspects still need to be strengthened.
Tao Yiwei: Let me add a little. Just now, Teacher Qi might have discussed this issue from the system perspective. I may start more from the hardware aspect. One is to make the existing hardware solutions more reliable, that is, to enable the robot to run stably for a long time in a real environment and interact with natural objects without being damaged during long-term use. In fact, this has not been fully achieved.
Moreover, the hardware still needs continuous iteration, such as increasing its degrees of freedom and adding tactile sensors. In this process, due to the increase in system complexity, its reliability becomes an even greater challenge. This is also a lot of work that we still need to do from the hardware end.
Hong Jun: Let me give the audience a more vivid example. At our company's annual meeting this year, Evan demonstrated opening a soda can with a robot on the spot. In fact, during the rehearsal, it was in an unstable state. For example, I also wanted to try to let the robot open the soda can, but I placed it at an arbitrary angle. At this time, the robot might need to rotate it to open the can. Is this rotation action difficult for the robot?
Tao Yiwei: Yes, this is a very good question. First of all, opening a soda can seems to only require a pair of hands and a fingernail, but when it comes to a dual-arm robot system, it is still a very challenging task. We only demonstrated it briefly, and there is still a long way to go in the future to make the whole process fully automatic and achieve a high success rate.
The alignment of the soda can is a matter of precision. It also needs to sense the current state of the soda can. Moreover, humans can pick up the can in any posture, adjust the angle of the can with one hand to reach the perfect state, and then let the other hand come over and pull it open in the most perfect state. The robot still needs a process, including the hardware design and control capabilities of the hands.
Hong Jun: Are there other companies in the world that can do better in terms of degrees of freedom, such as rotating the soda can and then opening it?
Tao Yiwei: I think some leading companies can optimize their hardware and put more effort into this aspect to achieve such a demo. However, at present, I don't think any company can do it completely autonomously. I'd like to hear Teacher Qi's opinion on this.
Hong Jun: Right. Teacher Qi also has a paper on using vision and touch for in-hand rotation. He should be an expert in this field.
Qi Haozhi: My view is that different companies have different publicity strategies or research focuses. For hardware manufacturers like Mr. Tao, their goal may be to prove that their hardware is very useful, whether in terms of mechanical structure or the control systems built with it. So, demonstrating some cool demos is very good.
In contrast, there are also some companies that focus on dexterous hand algorithms but do not produce dexterous hand hardware. They may show less of such capabilities and more of the generalization ability.
As Mr. Tao said, if we only optimize this one video, some leading hardware manufacturers or algorithm research institutes can achieve it. However, in the long run, we should focus on what kind of dexterous hand configuration is suitable for the most types of tasks and has the best interface for algorithms.
Image source: Figure
Hong Jun: So, Haozhi, according to your research, do you think there are already companies that can open a soda can when I place it randomly - not to mention any environment or scenario - just the soda can, whose bottle and opening are not necessarily facing the robot's hand directly?
Qi Haozhi: I don't think there are such companies at present. If a company wants to complete this task today, they may spend several months technically to achieve it, but they may need to invest a lot of resources and time. Maybe, considering their own company's development path, they won't do this task itself but prefer to make some algorithmic improvements to shorten the time needed to do this task in the future.
Hong Jun: That is, they won't optimize for a single task but hope that the current optimization direction is to make it adaptable to as many tasks as possible. They still value its generalization ability more.
Qi Haozhi: Yes.
Hong Jun: I see. If so, I remember that Figure AI previously released some videos showing a robot putting plates in the dishwasher. So, is this video a successful case selected from many failed attempts? Or, as you said, is it actually a demonstration video through teleoperation and does not represent that the robot has such abilities?
Qi Haozhi: I think there is no definite source of information here. However, I think the existing algorithms can easily shoot such videos in a fixed scenario. For example, if the success rate of the whole task is about 80% - 90%, shooting a video in this case, which is also completed autonomously, is not particularly difficult. However, for humanoid robots to enter thousands of households, what they may lack is the ability to complete these tasks with a 100% success rate in every scenario. For example, when putting plates in the dishwasher, a 90% success rate may not be enough. If one out of ten plates is broken, people won't want to use this robot. So, what may need to be improved is the success rate and the long - talked - about generalization problem.
Hong Jun: So, currently, the industry still focuses on the generalization ability of dexterous hands.
Tao Yiwei: Yes. Let me also add that in fact, we need to analyze the overall difficulty of each task in detail. Just now, you mentioned putting plates in the dishwasher and taking them out. When disassembling this task, it mainly involves the robot picking up the plates, opening the dishwasher door, and putting the plates on the rack. As Teacher Qi said, simple object grasping and pulling some levers are relatively simple tasks. The difficulty is actually not in the same order of magnitude as opening a soda can, which we just discussed.
When analyzing opening a soda can carefully, it actually involves picking up the can from the table with the left or right hand, adjusting the direction of the can opening, then aligning the other hand in the air and inserting it into the pull - ring at an appropriate angle, and pulling it open with an appropriate angle and force. Moreover, during this process, since both hands are operating on one object simultaneously, the other hand needs to resist the pulling force. How to ensure that the fingers don't apply too much force to crush the can is also a problem. From the perspective of the overall robot control system, this is a much more difficult task than putting away plates.
Image source: TetherIA
Hong Jun: So, what do you think are the bottlenecks of dexterous hands currently? Is it a problem of the entire robot industry, such as the model problem and generalization problem? Or, besides these, are there any unique difficulties in the dexterous hand industry?
Tao Yiwei: We can't just regard the dexterous hand as a hardware module. To generate its value, it must be paired with at least a dual - arm system. This dual - arm system will form a minimum working robot. However, when we really want it to perform tasks in an environment in a generalized way, we will need a mobile chassis or a mobile platform. With such a mobile platform, people may ask if a wheeled one can handle more complex road conditions, including the up - and - down movement of the robot. Then, people may also say that a full - humanoid one may be more suitable. So, for the dexterous hand to really have value, it is definitely not a simple matter that can be solved by a single hardware module.
Qi Haozhi: The difficulties are definitely everywhere in the entire robot field. People often ask me what the most difficult part of developing dexterous hands is. I think currently, there is still a lot of room for improvement in both hardware and software.
From the software level, my understanding is that some relatively mature machine - learning algorithms used on robotic arms or wheeled robots will encounter some unexpected problems when directly applied to more complex systems such as dexterous hands or humanoid robots. For example, a dexterous hand may have four or five fingers, and each finger has various joints. Each joint may interact with the environment and the object. So, how can we ensure that the impacts generated by these interactions are beneficial to us? For example, if we want to grasp an object with a gripper, we only need to consider how two contact points should touch the object. But if we now have ten contact points, how should these ten contact points touch the object respectively? Some contact points may have opposite effects on each other. In this case, the computational complexity will significantly increase.
From the hardware level, I started researching dexterous hands around 2021 or 2022. At that time, there were very few options for dexterous hands that we could buy and use. In the past one or two years, the hardware of dexterous hands, especially from domestic manufacturers and some hardware - making companies in the United States, has made great progress. So, I think the bottleneck in this aspect is gradually decreasing. However, I predict that it still needs several rounds of iteration to achieve a relatively convergent configuration, like the Unitree robots we can see now.
Hong Jun: Currently, the dexterous hands available on the market have different shapes and hardware. So, you need to adjust your software according to the hardware.
Qi Haozhi: Well, most of them are designed to be similar to human dexterous hands. However, each company's technical route is different. Mr. Tao's company should use the cable - driven technical solution, and there are also some companies that place motors on the fingers of the dexterous hand as the driving solution.
02 An Overview of the Three Major Technical Paths of Dexterous Hand Hardware
Hong Jun: Actually, when it comes to technical paths, as far as I know, there are several popular ones in the industry. One is the linkage - driven type, one is the cable - driven type, and the cable - driven type is further divided into unidirectional cable - driven and bidirectional cable - driven. There is also the motor - driven type. Could you briefly introduce the advantages and disadvantages of these technical routes? Which direction is the mainstream in the industry currently? Is there a trend of convergence?
Tao Yiwei: Let me start from the hardware perspective first, and then maybe trouble Teacher Qi to add from the user's perspective to see which one is more preferred.
First, let's look at the three main methods: linkage, cable - driven, and direct - drive.
The linkage method was originally used in traditional prosthetic hands. It uses a bottom - mounted driver, whether it is a linear push rod, an electric cylinder, or a worm gear and worm to generate a rotational motion, and finally realizes finger bending. This belongs to the traditional low - degree - of - freedom dexterous hand with six degrees of freedom. It is more like the shape of a hand, but the movement trajectory of its fingers is still relatively low in degrees of freedom. The fingertip follows a fixed one - dimensional trajectory. The design of the thumb is such that after it swings sideways, it directly corresponds to the space between the index finger or the middle finger. It also closes in through such a fixed - trajectory method. So, from the usage perspective, its characteristics are not as obvious as those of a gripper. This is the characteristic of the low - degree - of - freedom linkage hand.
Hong Jun: For a low - degree - of - freedom dexterous hand with six degrees of freedom, it's almost like the five fingers closing, and where is the other degree of freedom?
Tao Yiwei: It's the lateral swing of the thumb.
Hong Jun: That is, the thumb has two degrees of freedom, and the other fingers each have one degree of freedom.