Non - dancing Humanoid Robots Attempting to Perform Real "Work"

Humanoid robots made an appearance at the Spring Festival Gala and are being promoted for implementation through multiple channels.

Humanoid robots have set their sights on the Spring Festival Gala as a new battleground. Amid the excitement, the voice of "We don't need one million dancing robots" has begun to spread on the Internet.

Humanoid robots have set their sights on the Spring Festival Gala as a new battleground. Embodied intelligence companies such as Unitree, Magic Atom, Galaxy Universal, and Songyan Power have recently announced that they will appear on the Spring Festival Gala. Amid the excitement, the voice of "We don't need one million dancing robots" has begun to spread on the Internet.

In fact, while maintaining high visibility through demonstration capabilities such as dancing, embodied intelligence companies are also accelerating the promotion of humanoid robots to enter real - world scenarios for work. After a period of development, each company has achieved phased results on its own path.

Independent Variable Robotics, which officially announced the completion of a 1 - billion - yuan financing at the beginning of the year, recently released a real - shot video of its robot achieving full - process autonomous food delivery based on the company's self - developed VLA end - to - end model.

Independent Variable Robotics is one of the earliest companies to bet on the end - to - end embodied large - model technology route. The core of this technology route is to enable robots to make continuous decisions from perception, reasoning to action execution in the real environment through a unified embodied large model. Along this technological path, humanoid robots will move towards the ultimate direction of general labor.

However, an industry insider told a reporter from Science and Technology Innovation Board Daily that the end - to - end embodied large - model is not a lightweight or highly certain technology route.

On the one hand, model training highly depends on real - world interaction data, making the verification process difficult to scale and replicate. In addition, while the unified model improves the system's integrity, it also magnifies the complexity of engineering debugging and anomaly location. Especially in real - world scenarios, robots often need to complete continuous operations across environments and long task chains, which places extremely high requirements on the model's stability and fault - tolerance ability. "This also means that companies choosing this route often need greater capital and resource investment and a longer - cycle for commercial realization."

RoboSense has also demonstrated its robot's capabilities in the delivery scenario. In a 100 - minute video, the robot continuously completed a series of nearly 20 step - by - step operation tasks such as unpacking, folding recycling bins, moving items, cross - scenario navigation, and elevator interaction without human intervention. The focus was on verifying the stability and action consistency during long - term operation.

However, behind the similar skill points lies a different technical logic. A relevant technical staff member of RoboSense told a reporter from Science and Technology Innovation Board Daily that the company's embodied intelligence solution is not the traditional VLA but an extended VTLA - 3D on this basis. "By introducing information such as 3D lidar point clouds and dexterous hand tactile sense in addition to vision, the model's understanding of spatial structure and physical constraints is improved."

In their view, higher - density perceptual input helps reduce the dependence on large - scale data during the training phase. "The effective training data volume required for the model to reach the currently demonstrated ability level is about 200 hours, and the training convergence speed is relatively faster."

They further pointed out that this path is closely related to RoboSense's long - term accumulation in the intelligent driving field. "In autonomous driving practice, it has been found that compared with the pure vision route, the model that fuses 3D lidar point clouds and visual information requires an order - of - magnitude less data to achieve the same performance goal."

Industry insiders said that this actually represents two current paths for the implementation of embodied intelligence applications. One is to improve information density by introducing multi - modal perception such as lidar and tactile sense to reduce the scale of training data and prioritize solving the problem of stable execution in the real environment. The other is to adhere to the pure vision route, rely on large - scale data and model capabilities, and attempt to approach general intelligence across scenarios in the long - term.

It can be simply summarized that the multi - modal route emphasizes current usability, while the pure vision route bets on long - term universality. The two solve problems at different stages.

Different from the above two paths centered on model capabilities, there is currently another route that is more engineering and delivery - oriented. This route does not attempt to solve the problem of general intelligence in the early stage. Instead, through task decomposition, modular ability combination, and a strong control system, it enables robots to stably complete work within relatively clear task boundaries. Such companies usually have strong technical accumulation in the field of robot bodies.

Under this technical logic, the prerequisite for robots to "work" is that the tasks are fully structured. What actions need to be completed, in what environment to operate, and how to handle abnormal situations are all clearly split at the system design stage and covered one by one through engineering means. Its advantage lies in strong controllability and system stability, and it can be quickly implemented in semi - structured scenarios such as industry and inspection. It is also a route with relatively high certainty in current shipments and deliveries.

However, the corresponding bottlenecks are also relatively clear. Since the capabilities rely more on rules and engineering configurations, this route has limited adaptability to scene changes. Once the environment or tasks change significantly, system debugging and adaptation often need to be redone, and the expansion cost increases linearly with the number of scenarios.

From a more macro - industrial perspective, the differentiation of different embodied intelligence companies in their paths is essentially an attempt to overcome several real - world thresholds for the large - scale application of humanoid robots in different ways. Many industry insiders generally agree in interviews that for humanoid robots to move from demonstration to large - scale deployment, at least problems such as safe coexistence, continuous operation, dexterous operation, and cost control need to be solved simultaneously.

Before these constraints are systematically broken through, various technical routes are more about piecemeal advancement in front of different thresholds: some bet on general intelligence first, some solve engineering usability first, and some achieve stable delivery through task decomposition. This also determines that in the short term, humanoid robots are more likely to enter the real production and service systems with clear ability boundaries and application scenarios, rather than becoming general labor all at once.

Against the background of parallel advancement of multiple paths, the industry's expectations for "when the real qualitative change will occur" have gradually become more rational.

Pan Jing, a deputy to the Shanghai People's Congress, said in a recent interview with Cailian Press and other media that China has unique advantages in the integrity of the robot industry chain, manufacturing foundation, and richness of application scenarios. However, the breakthrough of humanoid robots with true generalization ability still requires time. He predicts that within the next five years, there is a chance of achieving phased breakthroughs in relevant core capabilities.

This article is from the WeChat official account "Science and Technology Innovation Board Daily", author: Yang Xiaoxiao. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Non-dancing humanoid robots are trying to do real "work".