Warum können Roboter noch nicht als Arbeitskräfte fungieren?

Man muss die Prüfungen in Intelligenz und Kosten bewältigen.

People are "fascinated" by humanoid robots.

At the just - ended 2025 World Robot Conference (WRC), these "iron warriors" were the superstars. During the five - day exhibition, it was always crowded, and the booths of various companies were surrounded by crowds of people. The visitors exclaimed "Wow" and took pictures frantically with their mobile phones. Social media was flooded with short videos about robots.

More and more people are surprised by the evolution speed of humanoid robots. They are no longer the clumsy iron castings but have dexterous hands and feet. The skin sensation is so realistic that one could almost mistake it for real. They can even raise their eyebrows, smile, and wink.

The capabilities are also developing at all levels –

Performance ability: They are proficient in dancing, catwalking, boxing, playing football, etc.;
Working ability: They can replace humans in areas such as household organization, coffee preparation, and industrial transportation;
Communication ability: They understand human language and can conduct simple natural conversations, gradually getting rid of the label of "artificial stupidity".

But they also have quite a few bugs –

Simple movements: The robots of several manufacturers have almost identical movements when dancing, rolling backward, and falling, which was criticized by users as "lazy programming";
Low efficiency: Folding clothes is as slow as a sloth, and in industrial applications, it still remains at the basic level of sorting;
High - priced: A high - end robot costs as much as a BMW.

Nevertheless, they are stumbling out of the laboratory and getting closer to the real world. Competitions such as "marathons", "sports championships", and "boxing matches" for humanoid robots still dominate the domestic and foreign media.

Recently, "Dingjiao One" talked with some leading companies and experienced experts in the humanoid robot industry. Although it is still too early to talk about mass application, and "intelligence" and "cost" are still the bottlenecks, technological progress, capital investment, and market demand are accelerating this process. In the future, humanoid robots could turn our perception of work, efficiency, and intelligence upside down.

How have humanoid robots developed?

What exactly are humanoid robots? It's not as simple as most people think. Let's look at it from three perspectives – appearance, interaction mode, and application areas.

Let's first look at what today's humanoid robots look like.

Normally, they have a human - like structure with a torso, head, neck, and limbs. In fact, however, there are great differences in the design of hands and feet. Hands can be divided into three types: the dexterous hand (with a bionic five - finger design that can mimic the fine movements of a human hand), the two - finger gripper, and the three - finger hand. Feet are divided into two - foot and non - foot types.

According to experts, although the dexterous hand and the two - foot design are closer to the human form, the functions are relatively simple and the costs are high. An expert revealed that the cost of a high - quality dexterous hand can be up to 100,000 - 200,000 yuan.

Kris, an expert with many years of experience in the Internet and self - driving car industries and also in the field of Embodied Intelligence, told "Dingjiao One" that the cost of a dexterous hand can account for one - third of the total cost of the robot. The costs can multiply when switching from the two - finger gripper to the three - finger hand, but the results of the high costs are not necessarily proportional.

To achieve better performance and a good cost - performance ratio, most companies for humanoid robots prefer a less human - like gripper shape and a wheel structure, unless customers have special requirements.

For example, the exhibitor Xingchen Intelligence demonstrated complex tasks such as preparing breakfast, making coffee, and painting on partitions with its Astribot S1 at the exhibition. All these processes were carried out with a two - finger gripper.

An Zhaohui, the R & D director of Xingchen Intelligence, told "Dingjiao One" that the operating functions of the humanoid robot are mainly concentrated in the upper body, but they do not only depend on the gripper but on the entire upper body. Therefore, they have developed an innovative cable - drive system for the key parts of the entire robot, which highly mimics human muscles and the way of exerting force. At the same time, it is safer and shows a more human - like and dynamic performance.

Now let's look at the interaction mode. Kris explained that there are mainly three methods to control humanoid robots: remote control (human movements are captured through sensors, controllers, and other devices), isomorphic arm (movements are transferred to the robot arm through joint mapping), and voice control. More complex commands such as preparing breakfast are carried out with remote control and isomorphic arm, while simple commands such as pushing, grasping, and placing can be executed with voice control or even autonomously.

But no matter which method is chosen, it is still far from real autonomous control by AI. Even the seemingly intelligent "voice control" mostly based on predefined rules. The robot seems to have its own consciousness, but it lacks real adaptability to the situation.

It is important to note that the remote control of a humanoid robot is not the same as that of a remote - controlled car and also requires a certain technical level.

Gashero, a guest engineer at the School of Computer Science of Peking University with rich practical experience in the Internet, self - driving car, and robotics industries, explained to "Dingjiao One" that although it seems that someone is controlling the robot with a remote - control device, the remote control actually sends commands, not direct control of the lowest level of the robot. There are still many subtasks that the robot itself must plan and execute. For example, the robot must keep itself in balance and coordinate the many motors and sensors on its body to execute the target movement. This requires technical know - how.

Finally, let's look at the application areas.

According to experts, humanoid robots can be clearly divided into To B (enterprise level) and To C (consumer level). The To B area mainly includes four fields: entertainment and shows, industrial manufacturing, tourism services, and medical care. The To C area focuses on the private household sector. Kris summarized that the goal of humanoid robots is to replace the traditional "three occupations" (security guard, cleaner, housekeeper).

Kris said that the entertainment and show industry is currently the most developed application area. Various dances, catwalks, and competitions often appear. The other areas are still in the initial stage of application. For example, industrial manufacturing is mainly limited to sorting and transportation on the production line, and tourism services are mainly limited to guiding in tourist areas.

Source: Kris' Robot Awakening Notes Provided by the interviewee

But from the perspective of actual use, Gashero believes that the presence of many humanoid robots after "starting work" is currently not very strong.

For example, the warehouse AGV robots (a combination of machine vision and robot arm) in the warehouse can already do the task of moving boxes very well and cost - effectively. Humanoid robots do not have strong competitiveness. Regarding the entertainment and show industry, he thinks it is not sustainable. "After the curiosity fades, the robots actually have to create real values."

In summary, humanoid robots have made great progress in recent years, but there are still some thresholds to overcome before they can fully realize their value.

The "cost barrier" and the "intelligence barrier" must be overcome

Several experts have summarized that the main problems of humanoid robots currently are that they are "not smart enough" and "not inexpensive enough".

You can imagine a humanoid robot as a "person" consisting of a "body" and a "brain". The hardware is its "body", which is also referred to as the main part of the robot by experts. The software is its "brain", which controls all thinking and action processes.

It is a consensus in the industry that the movement ability of humanoid robots in China is becoming more and more mature and can meet the basic operating requirements. But compared with the strength of the "body", there are big problems with the "brain" of humanoid robots. The current state of intelligence development in the industry is very unbalanced. Kris directly said that the software of humanoid robots still remains at the level of a demo version, like a child who has just learned to walk and can only walk in a certain small area.

The reason why large language models are becoming smarter and smarter is that they are constantly learning a huge amount of data. The same is true for humanoid robots. They must carry out numerous interaction processes in the real physical world to collect data and train their decision - making and action abilities. However, the reality is that the data on physical world processes is very scarce, which severely restricts the development of humanoid robots.

Kris said that the software of humanoid robots essentially consists of a VLA architecture. In this architecture, the "brain" must recognize the objects and instruct the "body" to execute the movement. For this, it relies on accurate and real spatial data.

For example, a humanoid robot must know where to hang the laundry and what the exact coordinates of that place are when it is to complete the task of hanging laundry. However, exactly these data are lacking in the real world. Therefore, many humanoid robots must be fixed at a certain place when they are to execute a certain movement, and the objects they are to grasp must also be within their field of vision, as if they were tied by an invisible rope.

But the intelligence of some humanoid robots has already developed.

For example, the Astribot S1 of Xingchen Intelligence (based on the whole - body VLA model of Xingchen Intelligence) can independently complete the task of tidying up objects in the household task of "tidying up the desk" when it encounters many unknown objects or abnormal disturbances. Even if the scenario is moved to the WRC event, only a small amount of data needs to be supplemented, and the model can still be used.

This is due to the closed - loop of the self - developed model, the robot, and the huge amount of data so far. The "meta - ability library" learning method enables the robot to constantly collect interaction information from different scenarios and transfer the abilities to new tasks without starting from scratch. It is like a child learning about the world through analogies.

But An Zhaohui also told "Dingjiao One" that the general generalization ability of humanoid robots is currently still a headache for the entire industry. Currently, it can only generalize to similar scenarios and cannot answer questions from different industries like ChatGPT. In short, it is still a specialist in a certain area, not an all - rounder.

Several experts have indicated that synthetic data is the key to the rapid implementation of Embodied Intelligence. Companies like Galaxy Universal are specialized in research in the field of Embodied Intelligence and have achieved leading positions in the field of the "brain".

Take the Galbot of Galaxy Universal as an example. In its offline shop in Zhongguancun, Haidian, Beijing, it can autonomously carry out the entire processes such as order acceptance, ordering and payment, goods collection, personal handover, and multilingual interaction with customers without being remotely controlled by humans. When handling more than 300 different types of hot and cold drinks, it can also grasp accurately without knocking over other goods.

Zhao Yuli, the strategy director of Beijing Galaxy Universal Robot, told "Dingjiao One" that this is based on the self - developed world's first end - to - end Embodied Intelligence large model for retail - GroceryVLA - by Galaxy Universal. Based on a large amount of synthetic data and Sim2Real technology (fusion of virtual and real worlds), GroceryVLA can realize a unified grasping strategy for different categories and objects without separate parameter adjustment for each product and has a strong ability for autonomous decision - making and interference suppression.

At this year's WRC event, Rev Lebaredian, the vice - president of Nvidia Omniverse and simulation technology, Wang Xingxing from Unitree Technology, and Wang He, the founder of Galaxy Universal, stood together on the stage. Nvidia announced that it will provide the first Jetson Thor chips in China to...

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Roboter können noch nicht als Arbeitskräfte fungieren.

How have humanoid robots developed?

The "cost barrier" and the "intelligence barrier" must be overcome