Ein ehemaliger leitender Angestellter bei Alibaba's Robotikbereich gründet ein Startup und erhält mehrere Millionen Yuan in der Seed-Runde. Das Unternehmen konzentriert sich auf die Anwendung von L4 Embodied AI-Technologie | Exclusive von Hardcore Krypton
Autor | Huang Nan
Redakteur | Yuan Silai
Hard Kr has learned that Hangzhou Yingshen Intelligent Technology Co., Ltd. (hereinafter referred to as "Yingshen Intelligent") has recently completed seed round and seed + round financing worth tens of millions of yuan in succession. The seed round was invested by Zhuoyuan Asia; the seed + round was jointly invested by Zhuoyuan Asia and Hangzhou West Lake Science and Technology Innovation Investment. The financing funds will be used for the development and training of the robot's right brain, commercial implementation and team building.
"Yingshen Intelligent" was founded in 2024 and focuses on the research and development and application of embodied intelligence technology. Based on its self-developed large spatial model and industrial scenario robots, it provides enterprises with low-cost, highly reliable and modular software-hardware collaborative solutions, gradually entering from the flexible processes of light industry and implementing in the service industry and various C - end scenarios.
Min Wei, the founder and CEO, used to be the technical leader of the former Alibaba robot team. He built Alibaba's local life delivery robot from scratch and implemented and operated it in scenarios such as buildings, hospitals and hotels. Many core team members are from Tsinghua University and have many years of experience in technology research and development and product application in the fields of artificial intelligence and robotics.
Currently, the advancement and generalization of embodied intelligence are of great significance for its technological implementation. The advancement of embodied intelligence means that agents need to show more advanced behaviors in complex physical environments, from simple action execution to collaborative processing of complex tasks, with stronger environmental perception, decision-making planning and execution capabilities. For example, in the industrial production scenario, robots not only need to accurately complete repetitive assembly tasks, but also need to quickly make adaptive changes according to the slight differences of parts or the temporary adjustment of the production process. Generalization means that robots can apply the skills learned in specific scenarios to new scenarios, such as household cleaning robots can work efficiently in different environments.
However, affected by various physical laws, diverse object attributes and complex environmental dynamic changes, to make robots perform reliably in complex environments, it is necessary to have an in - depth, comprehensive and accurate understanding of the physical world. The large physical world model plays a key role in this.
By integrating massive multi - modal data, especially visual information, the model can capture the internal laws and complex features in the real environment during the in - depth mining and learning process, simulate its movement, interaction and environmental changes, help robots learn the laws of the physical world, quickly reason and make decisions in new environments and predict the results of actions, and select the optimal solution.
Among them, the large spatio - temporal intelligent model independently developed by "Yingshen Intelligent" constructs a four - dimensional real - world large model through Real to Real. Through large - scale unsupervised data pre - training, it has the basic ability to understand and map the physical world.
Language is a highly condensed way of expressing information. In the field of robotics, although language models can obtain semantic understanding ability from large - scale text data, there is a fundamental contradiction between the spatio - temporal continuity of physical actions and the high condensation and discreteness of language symbols.
Min Wei pointed out that humans can supplement information through right - brain mechanisms such as visual perception and physical common sense, while the VLA model can only rely on limited visual - language alignment features for inference, which is prone to action deviation, ultimately leading to a deviation between the generated instructions and the actual situation in the real world, affecting the accuracy and reliability of the model output.
"This means that in the era of embodied intelligence, if robots continue to use human language, they may be restricted by the way of expression. When our large embodied intelligence model is smart enough, is it possible that a new language will emerge so that it is not restricted by human natural language?" Min Wei said.
Based on the above considerations, "Yingshen Intelligent" directly models video data in the large spatio - temporal intelligent model, verbalizes the video, and directly extracts the most real information from the video data to understand the real physical world, minimizing human intervention. This method can not only improve the accuracy and efficiency of the model, but also help reduce the information loss caused by the abstraction of natural language.
On the data side, "Yingshen Intelligent" uses a large amount of domestic video data, which can control the data training cost at a very low level. According to Min Wei, "Yingshen Intelligent" reasonably arranges multiple cameras in various work scenarios, such as installing cross - view cameras above and in front of workers, which can capture workers' work pictures from different angles and make full use of these video data for the three - dimensional spatial modeling of robots, motion capture and motion generation model training.
One of the advantages of this training method is that there is no need to purchase additional complex equipment, which greatly simplifies the training process and at the same time avoids disturbing the normal production order of the factory, realizing the parallelism of production and training.
During this period, the large spatio - temporal intelligent model will generate two parts of data: one part is to capture the position and posture of workers' joints through motion capture technology and map them to the joints of robots; the other part is to simulate the video data from the workers' perspective and generate training data similar to traditional remote operation. These data are finally used to train the small model on the terminal, and then deploy it on the unified hardware body, and then apply it to the working robots in specific scenarios.
Currently, "Yingshen Intelligent" has released the "Yingshen" series of industrial robots, which can operate continuously and stably under different working conditions and have generalization ability.
Min Wei told Hard Kr that thanks to his work experience in Alibaba's local life, "Yingshen Intelligent" is communicating with customers in multiple industries about cooperation needs, and has won industrial orders worth tens of millions of yuan. It will first focus on serving scenarios such as factories and continuously expand to industries such as express delivery and hotels. It is expected that more than a hundred robots will be delivered in total by 2025.
In addition, this year "Yingshen Intelligent" will focus on developing the robot brain to improve the robot's ability to understand the external world and execute tasks, so as to accelerate the popularization of L4 - level embodied intelligence in daily production and life.
Views of the investors:
Lin Haizhuo, the founding partner and chairman of Zhuoyuan Asia, said that the Yingshen Intelligent team is a team combining industry, academia and research from Alibaba and Tsinghua University. It can not only look up at the stars, start from the underlying technology to let robots understand the physical world through video verbalization, but also be down - to - earth and steadily promote the implementation and application of robots in industrial scenarios. We firmly believe that Yingshen Intelligent will open up a new track in the technical field and promote the popularization and implementation of embodied intelligence technology.