Famous robotics expert: The future of humanoid robots is to be non-humanoid.
In this article, I will explain why, despite venture capital firms (VCs) and large tech companies investing hundreds of millions or even billions of dollars in training, today's humanoid robots still can't learn how to be dexterous.
At the end of the article, after I finish discussing this view, two additional short essays are attached. The first one explores the issues that still need to be resolved to ensure the safety of humans when they are near bipedal humanoid robots while the robots are walking. The second one analyzes that although we will have a large number of humanoid robots 15 years from now, their appearances will neither be like today's humanoid robots nor like humans.
I. Preface
Since the term "artificial intelligence" first appeared in the proposal of the "Dartmouth Summer Research Project on Artificial Intelligence" in 1956, AI researchers have been exploring how to enable robot arms and hands to manipulate objects for over 65 years.
By 1961, Heinrich Ernst completed a doctoral thesis in which he described a computer-controlled arm and hand connected to the MIT TX - 0 computer. This device could pick up blocks and stack them, and amazingly, the process was even recorded on video. His supervisor was Claude Shannon, and he also thanked Marvin Minsky for his guidance - these two were two of the four authors of the Dartmouth AI proposal.
This research gave birth to industrial robots. Both in the past and now, industrial robots are computer - controlled arms equipped with various "end - effectors" (which can be understood as simple hands), and they have been widely used in factories around the world for 60 years.
Recently, a new generation of researchers has accidentally come up with the idea of manufacturing humanoid robots, and you may have noticed some of the associated hype. Gartner says that currently, humanoid robots are still in the early stages of development and are far from reaching the peak of the hype. The following figure, which is only one year old, shows that humanoid robots are at the starting point of the development cycle, while generative AI has passed its peak and is heading towards a trough:
The design concept of humanoid robots is to adopt the same body structure as humans and work like humans in an environment created for humans. This concept holds that instead of manufacturing different types of specialized robots, we only need to develop humanoid robots that can perform all human jobs. For example, the CEO of the humanoid robot company Figure said:
"We either build millions of different types of robots for specific tasks or create a single humanoid robot with a universal interface that can perform millions of tasks."
Here is the first stage of his "master plan":
1. Build a fully functional mechatronic humanoid robot.
2. Achieve human - like manipulation capabilities.
3. Integrate humanoid robots into the labor market.
During the just - passed summer, the CEO of Tesla, when talking about the humanoid robot named "Optimus", said:
"Optimus could generate $30 trillion in revenue and called humanoid robots 'perhaps the biggest product in the world'."
For these two companies, and perhaps a few other companies, their overall plan is to make humanoid robots "plug - and - play with humans" - able to replace humans in various physical labor at a lower cost and the same level. In my opinion, the idea that this goal can be achieved within a few decades is pure fantasy. However, many people predict that it can be achieved as soon as two years from now; and those more "conservative" proponents believe that humanoid robots will have a significant economic impact within five years.
My company focuses on developing robots deployed in warehouses. They use a new "wheel - based" mobile system (in fact, our mobile system is truly an innovative design. Just two years ago, this design didn't exist at all). We once pitched to some venture capital firms (this is a common term in the startup circle) to raise funds to expand our scale and meet customer needs. But we were asked: "Since everyone knows that bipedal and bimanual humanoid robots will take over most human jobs in two years, why are you still so persistent in developing warehouse robots?"
My personal opinion may ultimately not matter, but the key is that the hype around humanoid robots stems from the idea that they will become general - purpose machines capable of performing any physical task that humans can do. There is no need to change the existing work methods to replace human labor with automation. Humanoid robots just need to directly take over the existing work without having to adjust the processes laboriously. To achieve this, the manipulation capabilities of humanoid robots must reach the human level - just as we now expect self - driving taxis to have human - level driving skills on urban roads.
Therefore, for humanoid robots to make practical sense both economically and technically, they must have human - like manipulation capabilities. Among the supporters of humanoid robots, this view is undisputed, and this is also the fundamental reason for the existence of humanoid robots. Humanoid robot developers believe that to realize their practical value, they must quickly make the dexterity of humanoid robots approach the human level.
II. A Brief History of Humanoid Robots
For decades, many researchers have been committed to the development of humanoid robots. As early as the mid - 1960s, researchers at Waseda University in Tokyo, Japan, began to study the bipedal walking mechanism. By the early 1970s, the Humanoid Robot Research Institute at the university developed the first humanoid robot, WABOT - 1 (Waseda Robot).
In the early 1980s, WABOT - 2 was introduced. After that, Waseda University continued to develop new humanoid robots. The Japanese automaker Honda began to develop bipedal walking robots in the late 1980s and finally launched the humanoid robot ASIMO in 2000.
Sony first developed and sold the robotic dog Aibo and then launched the small humanoid robot QRIO in 2003, but it never actually sold this model. The French company Aldebaran launched the small walking humanoid robot NAO in 2007, which replaced Aibo as the standard platform for the 30 - year - old international robot soccer league.
Later, the company also launched the larger humanoid robot Pepper, but its commercial success was limited. Boston Dynamics, which was spun off from MIT 35 years ago, launched the humanoid robot ATLAS in 2013 after years of research on quadruped robots.
In addition to the early research on humanoid robots in Japan, there are also many academic teams around the world working on the development of human - like robots - some of these robots have legs, some don't; some have arms, some don't.
My research team at MIT started developing the humanoid robot Cog in 1992 and developed seven different versions of the platform successively. In 2008, I founded Rethink Robotics and launched two humanoid robots, Baxter and Sawyer, with thousands of units sold and deployed in factories around the world.
Some of my former post - doctoral researchers returned to Italy and launched the RoboCub open - source humanoid robot project, which helped dozens of AI laboratories around the world successfully develop humanoid robots.
For decades, these teams have always been committed to developing humanoid robots and exploring how to make them walk, manipulate objects, and interact with humans in an environment created for humans. As early as 2004, the "International Journal of Humanoid Robotics" was founded, initially in print form.
Now, you can access the content of this journal online. Currently, it has published 22 volumes of research papers.
2.1 The Manipulation Challenges of Humanoid Robots
In 1961, it was already a daunting task for Heinrich Ernst to make a robot manipulate objects with its arm and hand. Since then, whether it's robot researchers, industrial engineers, or today's practitioners, they have always faced this problem.
In the mid - 1960s, the parallel gripper was invented. This gripper has two parallel "fingers" that can open and close, and it is still the mainstream form of robot hands today. On the left side of the following figure is the parallel gripper I used when developing robots at Stanford University in the 1970s, and on the right side is the model produced and sold by my company, Rethink Robotics, in the mid - 2010s. Both are electrically driven.
The only difference between the two is that the more modern gripper on the right has a built - in camera that can align with the target object through a visual servo system. In the 1970s, due to technological limitations, this function could not be implemented in a reasonably priced product because the computing power at that time was not sufficient.
The German company Schunk produces and sells more than 1000 types of parallel grippers, including both electric and pneumatic (compressed - air - driven) types, which are compatible with robot arms. The company also sells some three - finger radially symmetric hands and a small number of specialized grippers. Currently, no multi - jointed finger (i.e., fingers with movable joints) robot hand has sufficient durability, strength, and service life to meet the requirements of actual industrial applications.
In scenarios where compressed air is available, it can be converted into suction through a Venturi ejector. Therefore, another common type of robot hand uses one or more suction cups to grasp objects by adhering to their surfaces. The following figure shows the suction - cup - type gripper launched by Rethink Robotics, which can be used in conjunction with the electric parallel gripper.
Single - suction - cup and multi - suction - cup end - effectors (i.e., the "hand - like" devices at the end of the arm) have been widely used in the handling of finished products - for example, packing finished products into customized boxes of the same specifications or handling finished product boxes and packages to be sent to consumers. In fact, the soft materials used in shipping packaging and suction - cup - type end - effectors have co - evolved: compared with other methods, suction cups can grasp soft - packed products sent to households more easily and quickly.
Over the past few decades, researchers have developed many multi - jointed finger robot hands that mimic human hands. The following figure shows several models developed by John Hollerbach, Ken Salisbury, and Yoky Matsuoka respectively.
However, from a general perspective, no human - like robot hand has shown significant dexterity, and no design can be put into practical application. Currently, most dexterity research uses mathematical and geometric methods, but these methods have never enabled humanoid robots to reach the human - level dexterity.
You may have seen amazing videos of human - like robot hands performing specific tasks, but they are completely unable to transfer their abilities to scenarios other than that specific task. Recently, Benjie Holson (disclosure: I work closely with Benjie at Robust.AI) proposed the idea of the "Humanoid Robot Olympics" in a light - hearted yet insightful blog post. He listed 15 tasks that any 8 - year - old human can do and set up medals for them.
For example, one challenge is that when a humanoid robot folds clothes, it needs to hang a men's shirt with an inside - out sleeve and fasten at least one button. Another challenge is to clean the peanut butter off its own hand. You can't excuse yourself by saying "this task is more suitable for other types of robot mechanical devices" - because the core value proposition of humanoid robots is to be able to perform all tasks that humans can do.
After reading the 15 challenges listed by Benjie, it's easy for you to come up with another 15 or 30 dexterity tasks that have little in common with the tasks he proposed, but humans can perform them without thinking. In addition, there are some more complex tasks that humans can perform when needed.
2.2 An Idea That Once Worked
So, what should we do? How can we make humanoid robots dexterous? I guess many people must have had this inner thought:
In the past 20 years, end - to - end learning has achieved remarkable results in at least three fields: speech - to - text, image annotation, and now large language models (LLMs). So, instead of trying to solve the dexterity problem through mathematical methods, why not directly adopt end - to - end learning? We can collect a large amount of data on humans using their hands to perform tasks and input it into the learning system, and then we can output a dexterous robot control model. In this way, our company can be valued at billions of dollars.
Stop overthinking and just do it!
The way humanoid robot companies and academic researchers choose to achieve this is roughly to let the learning system "watch" videos of humans performing manipulation tasks and try to learn the actions required for the robot to perform the same tasks. In a few cases, humans remotely control the robot (humans can see the robot and the objects being manipulated) and can get a small amount of force feedback and tactile feedback - but most of this feedback comes from the robot's hand, not the wrist, elbow, shoulder, or hip, and the accuracy of this tactile data is very low.
Benjie Holson pointed out in his blog post that the currently collected data is not only scarce in quantity but also low in accuracy. I fully agree with his criticism. Here is his view, which is already very clear and I don't need to add more:
"The currently effective method I've seen is 'learning - from - demonstration'. Researchers will prepare some robots and control interfaces (the standard configuration usually consists of two identical robots: a human grasps and moves one, and the other mimics synchronously; or use an Oculus headset + controller with hand - tracking technology), and then repeatedly record a 10 - 30 - second action (hundreds of times).