HomeArticle

"Self-variable Robot" Completes Hundred-Million-Level Financing and Achieves the World's Largest "Embodied Intelligent Operating Base Model" | Exclusive on 36Kr

周鑫雨2024-11-04 15:13
The "brain" of robots is achieving breakthroughs in capabilities such as generalization, versatility, self-learning, and handling complex tasks.

Written by Zhou Xinyu

Edited by Su Jianxun

Recently, 36Kr has learned that the embodied intelligence startup "X Square" has consecutively completed the Pre-A and Pre-A+ rounds of financing, with a total amount reaching the hundred-million-yuan level. The investors include Delian Capital, Cornerstone Capital, Qifu Capital, Nanshan Zhancheng Xintou. The old shareholder Jiuhe Chuangtou continues to increase its investment, and Yiwei Capital serves as the exclusive financial advisor.

It is understood that the financing will be used for the training of the next-generation unified embodied intelligence large model and the implementation of scenarios.

X Square was established in December 2023. The company is committed to achieving a general-purpose robot through the path of researching and developing an embodied intelligence general large model. In early April 2024, 36Kr reported that it had completed tens of millions of yuan in angel round and angel + round financing.

The founding team of X Square has a dual background in Robotics Learning and large models.

The founder and CEO, Wang Qian, graduated from Tsinghua University and is one of the earliest scholars in the world to introduce the attention mechanism in neural networks. During his doctoral studies, Wang Qian participated in several Robotics Learning research projects in a top robot laboratory in the United States, covering multiple cutting-edge fields of robotics.

The co-founder and CTO, Wang Hao, is a doctoral student in computational physics at Peking University. During his tenure at the Institute of Digital Economy of the Guangdong-Hong Kong-Macao Greater Bay Area (IDEA Institute), he served as the algorithm leader of the Fengshenbang large model team and released the first domestic multimodal open-source large model "Taiyi", the first batch of ten-billion-level large language models "Randan" and the one-hundred-billion-level large language model "Jiang Ziya".

Currently, the "brain" (whether the cerebrum or cerebellum) is increasingly becoming a hot topic in the embodied intelligence track.

Overseas, Skild AI, founded by two former Carnegie Mellon University professors, completed a $300 million financing in July 2024, and its valuation reached $1.5 billion just one year after its establishment; Physical Intelligence (PI), founded by former Google researchers and Stanford and Berkeley professors, has a valuation of $2 billion.

"X Square has firmly chosen the technical route of the 'unified large model' since its establishment, which coincides with what these two companies announced later," Wang Qian said.

However, at present, there are still many uncharted areas in the field of embodied intelligence large models. In China, the combination of the first batch of ten-billion-level large language models and robots is still relatively shallow, often only limited to simple voice interaction and perceptual planning.

At the same time, a universal large model that can truly solve the complex operational problems in the physical world has not yet emerged globally. Traditional robots are usually based on specific scenarios and tasks, and it is difficult for them to adjust their strategies autonomously according to changes in the environment and tasks. In the long run, the insufficient generalization of the model as the "brain" will also hinder the large-scale application of embodied intelligence.

Wang Qian told 36Kr that training an embodied intelligence general large model with high generalization, that is, the unified large model, is the current real solution.

Connecting a universal underlying model to embodied intelligence means that the robot has a brain that has learned the general architecture between all tasks, such as the laws of the physical world, the characteristics of objects, and the control force of the robotic arm.

Compared with the vertical model applicable to specific tasks or scenarios, the task generalization of the embodied intelligence general model enables developers to not need to train the model from scratch for each new task, reducing the amount of training data required for model fine-tuning. At the same time, the resulting model can also adjust its strategy autonomously according to task and environmental changes.

Since its establishment, X Square has carried out rapid iterations in the research and development of the embodied intelligence general operation model. Just 2 months after its establishment, X Square trained the first version of the embodied intelligence operation model, which can achieve long and complex operation tasks such as cutting vegetables and pouring water. In the middle of 2024, the model has shown the ability of few-shot learning and spontaneous cross-task transfer in specific tasks.

Recently, X Square has achieved the currently largest parameter-scale embodied intelligence general operation large model in the world: the WALL-A model of the Great Wall series (GW), and the technical route adopted is "Unified Embodied Intelligence Large Model". Wang Qian said that this model has reached or exceeded the SOTA level in multiple dimensions.

According to Wang Qian, the characteristics of the WALL-A model lie in the realization of the "unification" in two dimensions:

First, it realizes the complete vertical unification of all steps "end-to-end". Input the most original video, language, and sensor signals, and output the final speed, position, and torque of the robot. It is completely solved by one model without any segmented steps;

Second, it realizes the horizontal unification of different tasks. All tasks are trained in the same model, and the reasoning is operated with the same model. That is to say, for all operation tasks, only this one model is used to solve all problems.

Wang Qian mentioned to 36Kr that the end-to-end vertical unification can avoid the noise and information loss introduced by human intervention; the horizontal unification of tasks enables the robot to obtain mutually referential experience from different tasks like a human.

"The breakthrough of the new generation of embodied intelligence technology is reflected in generalization, universality, self-learning, and the ability to handle complex tasks, all of which are reflected in the unified large model," Wang Qian said.

He revealed that X Square has achieved a series of innovations, including innovations in the underlying algorithm and framework, as well as overall system-level innovations and optimizations in data engineering and training engineering.

The following are the investor evaluations:

Delian Capital:

Delian Capital has long firmly supported the early innovation in the robot field. The embodied intelligence technology transition brought by the large model will essentially improve the generalization of robots and accelerate the popular application of scenarios. As an embodied intelligence basic model company, X Square innovatively proposes a unified end-to-end embodied basic large model that integrates the cerebrum and cerebellum, demonstrating the huge potential of the scaling law in the embodied intelligence field. The X Square team has the industry's scarce ability to integrate Robotics Learning and multimodal large models, deeply coupling the model architecture, training methods, and data pipelines, with significant differentiation and competitive barriers. Delian Capital highly recognizes and firmly supports X Square to become an industry-leading embodied intelligence basic model company.

Cornerstone Capital:

X Square's profound understanding and technical accumulation in the embodied large model are impressive. It is one of the very few compound teams in China that have not only completed the multimodal large model but also deeply understood the complex operations of robots. Since the company's establishment, it has firmly chosen the end-to-end training paradigm, and the robot equipped with its model has shown a leading level at home and abroad in key difficulties such as the understanding of spatial relationships, long-sequence complex actions, and scene generalization ability. We believe that X Square is a professional and leading entrepreneurial team with a geek spirit, a dream, and the willingness to work for it. We warmly welcome them to join the Cornerstone family, and we will continue to firmly support the company's development and assist the company in achieving its long-term goals.

Qifu Capital:

X Square is currently the only company in China dedicated to the end-to-end unified embodied large model, and it is also a rare native team that can organically combine the complete language and multimodal large model training experience with the robot learning experience. This generation of embodied intelligence technology requires both breakthroughs in the new technology stack and innovations in the overall engineering implementation. The company has shown great advantages in technological original innovation, engineering innovation, and engineering implementation. It is a team with the temperament of an explorer, the spirit of a scientist, and the practice of an engineer. We believe that as X Square's work in model, data, and engineering is gradually implemented, it will surely show a world-leading model effect and the commercial potential of truly implementing a general-purpose robot.

Nanshan Zhancheng Xintou:

X Square adopts an advanced end-to-end technical route and is committed to building a universal embodied large model that can directly go from perception to action. Thanks to its efficient data collection system, the company can achieve rapid iterations between data and the model to ensure continuous technological leadership. Within just half a year, the company's independently developed basic model has been able to train robots to perform a series of complex and delicate physical operations. The team is composed of experts in robot learning and the large model field, showing a clear differentiation advantage compared to other embodied intelligence companies in China. The company's technology is highly consistent with the artificial intelligence strategy of Nanshan District, and it is expected to solve the core bottleneck in the development of the embodied intelligent robot industry and has the potential to lead the future development of the embodied intelligence large model.

Jiuhe Chuangtou:

X Square has firmly adhered to the technical route of the unified large model from Day 1 and has continuously invested in the Foundation Model of embodied intelligence, and has already achieved phased results. The on-site execution effect, complex task processing ability, and generalization ability of the current model are in a leading state. Jiuhe continues to increase its investment in the X Square team. We expect the team to continuously advance the evolution of the technical route of the embodied intelligence large model based on the accumulation of LLM theory and practical experience, bringing new changes to the embodied track.