HomeArticle

Based on the learning from human video data, "Zero-Power Technology" has completed the research and development of two humanoid robots within four months | Early-stage Project

黄 楠2024-10-24 10:15
The ability of the imagination generation model is enhanced, the real data is reduced, the generated data is increased, and the generalization ability is also enhanced accordingly.

Author|Huang Nan

Editor|Yuan Silai

The importance of data for technological development has long existed. However, due to the difficulty of its collection, training data has become a major obstacle in the large-scale implementation of robots.

Currently, the training data for robots can be generally divided into three categories: The first is real teleoperation data, the second is high-quality simulation synthetic data, and the third is human behavior data, mainly derived from Internet videos.

Robot Training Data (Source: Zero Power Technology)

With the verification of the Scaling Law in the field of artificial intelligence, large language models and end-to-end high-speed large models represented by Tesla FSD (Full-Self Driving) reasoning have brought new inspirations to the field of embodied intelligence.

To achieve embodied intelligence, whether following the multimodal large model, adopting the construction of a high-dimensional world model, or combining multiple paths of small models in specific domains, the core lies in whether the demand for diverse and high-quality data can be solved. Through the consideration of this core issue, the embodied intelligence startup "Zero Power Technology" that Hardcore recently came into contact with has proposed their solution - learning based on human video data (Learn from hunman video).

"Zero Power Technology" was established in May this year, jointly incubated by Tsinghua University and the Jianghuai Frontier Technology Collaborative Innovation Center. The core members come from the Tsinghua University AI & Robot Intelligent Robot Laboratory, and many of the main creators in the team have worked for Internet giants such as ByteDance and Baidu, and collaborative robot companies such as JAKA.

Data volume is a necessary factor for achieving universal embodied operations, and the unit data cost is the basic condition for product implementation. A practical situation is that both real remote sensing data and simulation data have the problem of high unit cost, while although human data has the largest data volume, its data quality is the lowest. How to make good use of the massive human video data has become the main technical path explored by "Zero Power Technology".

Data Volume and Application Scenarios (Source: Zero Power Technology)

Different from the current mainstream teleoperation technology, "Zero Power Technology" uses a three-dimensional human motion posture perception model. By extracting the motion data of human key joints and remapping it to the robot, it can reduce the learning cost of the algorithm to a certain extent. At the same time, based on the 4D Gaussian sputtering technology, the observed information of the robot is reconstructed to obtain a robot action generation strategy based on the diffusion model, driving the robot to complete tasks independently and enabling it to have the ability to learn directly from humans (LFWH).

Test results show that on the basis of LFWH, the robot can conduct reinforcement learning in simulation training to achieve more flexible and generalized operations, to make up for the congenital structural differences between the robot and humans, thereby surpassing the work efficiency of human experts and achieving rapid deployment.

According to the data of specialized scenarios and the Internet data to train the imagination generation model (Imaginator). With the improvement of the Imaginator's ability, the real data required in the model training gradually decreases, the proportion of generated data becomes higher and higher, and the generalization ability of the algorithm also increases. Thus, the generation ability of the Imaginator becomes increasingly close to the real world, approaching the world model of real physical laws, and ultimately achieving true general artificial intelligence.

Imagination Generation Model (Source: Zero Power Technology)

Take the double-arm robot F1 released by "Zero Power Technology" in September as an example. F1 is currently trying to be introduced and used in the specialized operations of professional factories. By gradually expanding the scenarios and promoting the data flywheel to roll, the generalization ability of the model and the robot can be effectively improved.

For example, for problems such as the difficulty of transforming small and medium-sized enterprise factories and manual programming, "Zero Power Technology" proposes to use the robot's autonomous learning to achieve in-situ replacement. There is no need to change the factory layout or programming. Just introduce the robot into the human positions, and through its continuous learning ability and continuous filling of data, the generalization ability of the robot in scenarios can be enhanced.

Using Robots to Simulate Human Behaviors for Data Training (Source: Zero Power Technology)

In terms of the robot as a whole, "Zero Power Technology" has completed the research and development of two humanoid robots in four months. In addition to the F1 double-arm robot mentioned above, on October 24, the company officially released the first humanoid robot Z1. This robot can walk stably for a long time on various irregular road surfaces and complex terrains, and has excellent anti-interference performance. It can maintain a stable standing even when subjected to strong impacts from all directions.

Z1 is equipped with a 150Nm joint motor, has 27 degrees of freedom throughout the body, and the load test limit is as high as 20 kilograms, with a load/self-weight ratio of over 70%. It is equipped with a self-developed EtherCAT communication module, and the overall system can achieve low latency and high bandwidth. At the same time, the team also uses AI technology to assist in the design of the robot's structural parameters, resulting in lower energy consumption and higher dynamic motion performance.

Currently, "Zero Power Technology" is trying to let the robot learn the scenes in the movie "Real Steel", and has achieved an accurate replication of human arm movements. According to the officially released video footage, by observing human behavior, Z1 can imitate humans to learn attack moves and conduct dynamic defense based on the flexible whole-body coordinated control ability.

The founder, Min Yuheng, said that it is planned that by the end of this year, its robots will launch a humanoid robot boxing competition without operating equipment to realize everyone's mecha dream.