We went to see Zhiyuan Robot's "Data Acquisition Plant". Wait! Isn't this the set of "Star Wars"?
Written by | Tian Zhe
Edited by | Su Jianxun
In early 2025, we learned about something. The domestic embodied intelligence unicorn "Zhiyuan Robot" (the company where the well-known Bilibili UP owner "Zhihui Jun" is the CTO), in Shanghai, has built a "Data Collection Factory" (referred to as the "Data Collection Plant").
What is this? Why build it? How to collect data? A series of doubts emerged in our minds, and we decided to take a look.
Of course, before going, we had some very stereotypical imaginations about the workflow of a "Data Collection Plant" - in a dim room filled with server black boxes, programmers with dark circles under their eyes and worrying hair loss, mechanically typing code on the keyboard...
Wrong, wrong, wrong! When "Intelligent Emergence" arrived at Zhiyuan Robot's Data Collection Plant in Pudong, Shanghai, we found that the reality is completely different from our imagination!
It is no exaggeration to say that this is completely like the set of the American movie "Star Wars"!
"Star Wars" movie poster; Source: Internet
In this 3,000-square-meter Data Collection Plant, rooms with different themes occupy the majority of the factory area. Each room carefully restores the layout of objects in real life, and robots perform different tasks in different scenarios.
In the bedroom, the robot obediently learns to fold clothes.
Robot learning to fold clothes; Source: Zhiyuan Robot
In front of the dining table, the robot neatly arranges the tableware one by one.
Robot arranging dinner plates; Source: Filmed by "Intelligent Emergence"
The robot also needs to learn to serve various dishes without shaking hands.
The robot is using a spoon to scoop eggs; Source: Filmed by "Intelligent Emergence"
And in front of the supermarket cashier, the robot holds a scanner in one hand and scans the goods with the other hand.
Zhiyuan Robot is learning to scan goods; Source: Filmed by "Intelligent Emergence"
After the visit, "Intelligent Emergence" met Yao Maoqing, the person in charge of the Data Collection Plant. He is also the president of Zhiyuan Robot's Embodied Product Line and the executive dean of the Research Institute, responsible for the research and development of data-driven embodied intelligent products.
Previously, Yao Maoqing was responsible for the research and development of perception algorithms and end-to-end large models at companies such as Waymo and NIO.
Yao Maoqing told "Intelligent Emergence" that every time a robot completes an action, it is equivalent to a piece of data. The data will be uploaded to the cloud through the robot's mainframe, and the Zhiyuan Robot team will use these data to train the robot's large model, so that the robot can truly master a skill, such as making coffee, ironing clothes, etc.
In order to allow the robot to quickly learn skills, Zhiyuan has arranged one-on-one teaching teachers for them - data collectors. They are all young and energetic boys and girls. In order to better teach the robot to complete the actions, the collectors also need to have good body coordination and standard actions.
The data collectors hold the equipment and control the robot to complete actions such as grasping, holding, and releasing hand in hand. Sometimes they also wear VR equipment to more accurately allow the robot to imitate and learn human actions.
It is understood that now Zhiyuan Data Collection Plant has put into use nearly 100 robots, and the average daily data collection is 30,000 - 50,000 pieces.
In order to allow the robot to master as many skills as possible in different environments as quickly as possible, the Zhiyuan Data Collection Plant simulates five scenarios: home, retail, service industry, catering, and factory.
Here, you can find that in the supermarket, there are not only various snacks, but also wine, cigarettes, and even the prices of fruits and vegetables are marked.
Supermarket simulated by Zhiyuan Robot; Source: Filmed by "Intelligent Emergence"
There are also a group of robots scattered in their respective "workstations", learning simple skills such as folding clothes at the table.
Robots are learning different skills at the workstations; Source: Filmed by "Intelligent Emergence"
It is understood that the area of the Data Collection Plant will also increase by 1,000 square meters, which can add more scenarios and also customize simulation scenarios according to customer needs.
However, at present, there are few robot companies in the industry that create such diverse scenarios. A question follows: Zhiyuan Robot is determined to build a Data Collection Plant. How does this process unfold?
Building a Data Nourishment Field for Embodied Intelligent Robots
For most startups, building a factory for data collection with a high amount of funds is undoubtedly a huge risk, but Zhiyuan Robot seems to have not hesitated and completed the construction of the Data Collection Plant in just over a month.
What prompts Zhiyuan Robot to build the Data Collection Plant at a high cost is the huge gap in the existing data volume in the supply side of the industry.
In June 2024, Zhiyuan Robot decided to develop an embodied intelligent large model for robots, which requires a massive amount of data to train the large model.
Yao Maoqing told "Intelligent Emergence" that the robot learns a skill through hundreds of pieces of data, and these actions are often long-range tasks, such as making coffee and ironing clothes.
They once tried to find an open-source database in the industry, but found that high-quality, uniformly formatted data almost does not exist. Even if the industry has open-sourced millions of training data sets collected by real robots, these data are actually collected by robots of different companies and different models and specifications, and the data quality is low, not meeting Zhiyuan's requirements.
Yao Maoqing said that the data differences of different sensors and forms are too large, which will weaken the overall training effect. For example, the data of a six-axis robotic arm is almost not reusable on a seven-axis dexterous robotic hand, so uniformly standard data is needed.
And the process for Zhiyuan to make up its mind to build the Data Collection Plant is also very simple.
Yao Maoqing said that Zhiyuan Robot collected several thousand pieces of data for training the algorithm, which can make the robot successfully complete a certain action, but it cannot be generalized - if the object type, color or even light is changed, it will affect the robot to complete the same action again. Therefore, Zhiyuan Robot decided to build a factory to collect data on a large scale.
Robots in different rooms are collecting data; Source: Filmed by "Intelligent Emergence"
In the future, the Data Collection Plant will continuously provide data nourishment for robot learning. It is understood that more than two months after the Zhiyuan Data Collection Plant was put into use, it has collected more than one million real robot data sets, and the collection tasks exceed one thousand, each task contains several hundred pieces of data, and some particularly difficult long-range tasks can reach several thousand.
"Soon we will have more than ten million pieces of data." Yao Maoqing said with a smile.
Exploring the Scaling Law of Robots
After collecting tens of thousands of robot repetitive actions, Zhiyuan Robot has received some pleasant surprises: the robot can control the amount of water poured according to the requirements without prior training; the robot can learn to fold pants after being taught only a few dozen times.
This is exactly the kind of robot that Zhiyuan Robot wants to create - a robot that can independently understand human instructions and the external environment, and can adapt to complex environments.
Zhiyuan hopes that the robot can replace coffee machine accessories of different brands and models; Source: Filmed by "Intelligent Emergence"
Over the past few decades, the control of robots has often relied on people's preset rules, inputting situation descriptions and response rules to the robot, and the robot completes the operation in the corresponding situation. However, the situations that robots encounter are ever-changing, and it is difficult to rely on pre-input rules to enable the robot to deal with all situations.
After the explosion of large model applications, robots have gained intelligence from their cold bodies, being able to understand the world and humans. And what Zhiyuan Robot is developing is a robot with an end-to-end large model, which has stronger general capabilities and a faster response speed.
There are usually three steps for a robot to complete an action from receiving an instruction: perceiving the external environment, making a decision, and controlling the limbs to perform the task. Information may be distorted in this chain transmission, thereby affecting the robot's completion of the action.
But the end-to-end large model does not require modularization and does not rely on precise measurement. Just like when a human overtakes a car, they will not get out of the car to measure the distance between the two cars before overtaking.
Zhiyuan Robot's vision for the end-to-end large model robot is that the robot can receive complex human instructions, such as asking the robot to fetch a mobile phone from a distance, or take a bag of potato chips from the refrigerator. These instructions not only test the robot's understanding ability of the task, but also require the robot to be able to identify the object, and complete moving to the corresponding location, fetching the object, returning, and submitting the object.
But reaching this state is not easy. Yao Maoqing said that it is necessary to continuously feed data to the large model. The larger the amount of data, the closer the performance of the large model in a certain scene will be to that of humans. He estimates that the amount of data is tens of millions to 100 million pieces, and the Scaling Law of the robot is far from coming.
"Intelligent Emergence" learned that robots require a combination of software and hardware, and it is difficult for robot technology to develop rapidly with only one of them. The hardware cost in the United States is relatively high, so most American robot startups only develop algorithms. China already has a supply chain advantage. Combining data and self-developed hardware will enable rapid iterations of algorithms, hardware, and software.
Yao Maoqing believes that the overall progress of China's robot technology is on par with that of the United States, because the labor cost in the United States is ten times higher than that in China, and various parts also need to be purchased from China.
With lower costs and efficient iteration speeds, Zhiyuan Robot has expanded the scale of scene simulation and data collection. Those technologies that seem "out of reach" for American robot companies are gradually becoming a reality in the continuously flowing data in China's Data Collection Plant.