A startup founded by former Zhipu AI team: Lenovo led the investment in a humanoid robot large model company | Exclusive Report by Yingke
Author | Huang Nan
Editor | Yuan Silai
Yingke learned that Beijing BeingBeyond Technology Co., Ltd. (hereinafter referred to as "BeingBeyond", with the English name BeingBeyond) recently completed a financing of tens of millions of yuan. Lenovo Star led the investment, followed by Starlink Capital (Z Fund), Yanyuan Venture Capital, and Binfu Capital. Potential Capital served as the exclusive financial advisor. The funds will be used to increase investment in core technology R & D, accelerate the iteration of existing models and industrialization verification, so as to continuously enhance the technological barriers and product competitiveness.
BeingBeyond was established in January 2025, focusing on the R & D and application of general large models for humanoid robots. The founder, Lu Zongqing, is a tenured associate professor at the School of Computer Science, Peking University. He once served as the head of the Multimodal Interaction Research Center at the Beijing Academy of Artificial Intelligence and was in charge of the first original exploration project on general agents of the National Natural Science Foundation of China. Many core members are from the Beijing Academy of Artificial Intelligence, with rich technical R & D experience and application implementation experience in fields such as reinforcement learning, computer vision, robot control, and multimodality.
Currently, the scale of data and generalization ability are the core contradictions restricting the performance improvement of embodied brains. On the one hand, for embodied intelligent robots to achieve highly anthropomorphic action and decision - making abilities, they rely on a large amount of diverse data for in - depth training. This data covers various scenarios such as daily trivial operations and complex environment interactions, and the data scale is increasing exponentially. However, the data collection process still faces multiple thresholds such as technology and resources. It relies on a large amount of manpower and is difficult, and the storage cost is rising rapidly with the surge in data volume.
On the other hand, even with a large amount of data support, for robots to flexibly handle new tasks, new objects, and new interferences in unknown environments, they still rely on strong generalization ability. However, when existing models face significantly different scenarios, their performance is mediocre. It is difficult to effectively transfer the learned knowledge to new situations, and their adaptability in practical applications is poor.
Therefore, how to improve generalization ability with a limited data scale has become a key challenge for embodied brains to break through performance bottlenecks and move towards practical application.
Pretrained data used by BeingBeyond (Source/Enterprise)
Facing the two core capabilities of operation and movement of humanoid robots, BeingBeyond divides its general large - model system into three layers: an embodied multimodal large - language model, a multimodal posture large model, and a motion model, and builds a self - learning embodied agent framework.
Lu Zongqing told Yingke that different from other models, BeingBeyond's pretrained data comes from human movement and hand operation videos on the Internet. By analyzing the action sequences in these natural scenarios, a pretrained foundation for the robot's motion operation ability is built. This technology route driven by public video data breaks through the strong dependence of traditional solutions on real - machine data of robots and can achieve cross - modal migration from "human behavior demonstration" to "robot action generation".
Specifically, BeingBeyond proposed a multimodal posture model. Through rich video resources on the Internet, including full - body human movements such as walking and dancing, and fine hand operation data from the first - person perspective such as grasping objects and using tools, it can provide rich and diverse action samples for the model. Through this video - action data, the model can learn the manifestation forms of various actions in different environments and can achieve generalized end - to - end motion operations based on real - time environmental information and task requirements.
In terms of the embodied multimodal large - language model, BeingBeyond independently developed the Video Tokenizer technology, which emphasizes the understanding and reasoning ability of the spatio - temporal environment, especially the analysis of first - person perspective video content. By deconstructing continuous video streams into visual token units with both time series and spatial semantics, this model can accurately capture the sequential logic of actions, such as the continuous process of reaching out, raising the arm, and grabbing an object, and understand the physical world and human behavior based on spatial features such as object orientation and relative limb positions.
Currently, although the simple multimodal large - language model + motion operation strategy already meets the conditions for commercial implementation, affected by the dynamic environmental changes in real scenarios, the generalization ability of robots is difficult to adapt. How to enable humanoid robots to have autonomous learning ability has become the key breakthrough point for their commercial implementation.
To this end, BeingBeyond proposed the Retriever - Actor - Critic framework. Through the collaborative application of RAG (Retrieval - Augmented Generation) of real interaction data and reinforcement learning, it can not only improve the response accuracy of the model and user experience, but also form a closed - loop of "data collection - model optimization - effect feedback", enabling the robot to dynamically adapt to changeable scenarios and providing a feasible technical path for its large - scale implementation.
Pretraining + post - training architecture (Source/Enterprise)
Lu Zongqing pointed out that based on the general action model pretrained with Internet videos and then through later adaptation training to achieve migration to different robot bodies and scenarios, BeingBeyond's technology route can avoid data waste caused by hardware iteration and effectively solve the contradiction between the scarcity of real - machine data and scenario generalization. Currently, the company is promoting scenario verification cooperation with leading robot manufacturers to accelerate the application and implementation of embodied intelligence in more fields.
Views of investors:
Gao Tianyao, Partner of Lenovo Star said that currently, the technical route of embodied large models has not converged. For example, there is a lack of a unified architectural paradigm. BeingBeyond's team's technical route solves the problem of limited sources of training data. At the same time, it uses a modular approach to connect the big and small brains to build a complete technical framework. Compared with foreign teams with similar technical routes, it has full - stack technical capabilities. Relying on self - developed large models such as multimodal large models, it has strong competitiveness in solving problems such as task and environment generalization and cross - body of embodied large models, gradually achieving "zero - shot" generalization. We look forward to the implementation of BeingBeyond's team's products in high - potential application scenarios and the realization of a commercial closed - loop.
Wang Pu, Partner of Starlink Capital (Z Fund) said, "As an angel investor in BeingBeyond, I am extremely proud to witness the milestone breakthroughs made by Professor Lu Zongqing and his team in the field of general humanoid robots. From building the industry's first million - scale MotionLib dataset to developing the end - to - end Being - M0 action generation model, the team has not only verified the scale effect of 'big data + large model' in embodied intelligence but also achieved a technical closed - loop for cross - platform action migration. This innovation's ability to convert text instructions into fine robot actions not only breaks through the limitations of traditional methods but also paves the way for robots to enter thousands of households. I firmly believe that BeingBeyond will continue to lead the iteration of embodied intelligence - from dexterous operation to full - body motion control, promoting robots from the laboratory to daily life. We will join hands with BeingBeyond and welcome a new era empowered by general robots together."