A daily wage of 120 yuan, and everyone is involved in data collection. Who is training the next robot nanny?
The most important keyword in the embodied intelligence industry this year is data.
This term not only appears in the latest papers of academic journals and the PPTs presented by technology companies, but also in the part - time job recruitment information in a certain city in Hubei and a small - scale training in a village in Shaanxi.
The so - called data refers to all the information that actually occurs in the human world, such as how to fold clothes, how to water flowers, how to cook, etc. In this regard, every ordinary person with the ability to act is a teacher for robots.
As the models are iterating and the algorithms are advancing, what seems to be the most cutting - edge technology has actually spread into a large - scale experiment that the whole public can participate in. In this experiment, there are not only many roles, including robot manufacturers, data manufacturers, human resources companies, and flexible employees, but also rapid changes. The expensive shooting equipment issued this month is replaced by mobile phones next month.
Yibang AI has conducted in - depth research on the current situation of embodied intelligence data collection. It not only hopes to keep in sync with the technological pulse but also expects to see ordinary people share the industry dividends.
This article is about 8,000 words and is divided into five parts. For your convenience, the following is a summary:
1. A daily wage of 120 yuan, the wave of public data collection is coming
Large - scale recruitment of crowdsourced data collectors is taking place across the country. They use equipment to record videos at home or outdoors.
2. Data! Data! Data!
The embodied intelligence industry is starving for data. The non - ontology data collection has fully erupted since March 2026.
3. The "Warring States Period" of data collection equipment
The data collection equipment has gone through three iterations, and the first - person human perspective video shooting has attracted the most attention.
4. A business with a 100% gross profit
The data collection business has a high gross profit, but the crowdsourcing form also faces many problems.
5. Technological shift: VLA or world model?
Behind the upsurge of crowdsourced data collection is the change of algorithm routes and the promotion of capital.
A daily wage of 120 yuan, the wave of public data collection is coming
"I'm going to start folding T - shirts." Zhang Yue, a native of Hebei, wearing an electronic headband with her iPhone clamped on it and holding two special grippers in her hands, stood in front of the bedroom bed and introduced her work to the air in a very ceremonial way.
Folding clothes is usually an easy task, but it's not easy to control the angle of the grippers, and the corners are always uneven, so she has to try repeatedly. After folding four or five pieces, she began to feel a little sore in her palms.
After folding the clothes on the bed, a voice in the phone prompted her to change the scene. On this day, Zhang Yue folded clothes in different scenes such as the bedroom, study, living room, on the table, on the bed, on the ground, by the window, with the light on, and in natural light. The phone also reminded her to fold different styles and colors of clothes and not to fold the same piece of clothing repeatedly.
Embodied intelligence data collection assistant App
What Zhang Yue is doing is embodied intelligence data collection. That is to say, the actions of her folding clothes are collected as data through the mobile phone camera and grippers. After being labeled and processed, they will become the training materials for robots until one day robots also learn to fold clothes.
In 2026, the whole country's attention was attracted by robots that can do martial arts and run. However, it is still a difficult problem for robots to do work, and the biggest obstacle to learning to work is data. So where does the data come from? The magic weapon of the revolution is to mobilize the masses.
Zhang Yue, in her early 30s, is a full - time mother. She usually takes some part - time jobs to supplement the family income. In March, she got this embodied data collection job from a familiar part - time job group: working at home, with a salary of 30 yuan per hour. After signing up, she participated in a half - day training, including downloading the collection software (a self - developed App that cannot be listed on the App Store and can only be installed and registered on - site with background permission), learning to use the grippers, and shooting videos as required. She could go home with the equipment after half an hour of self - practice.
The collection software will issue various tasks, including cleaning (sweeping the floor, mopping the floor, cleaning the windows, washing the dishes, cleaning the desktop/bathroom), clothing handling (folding, drying, storing, ironing), item organization (placing items, classifying and summarizing, picking and placing goods, organizing the bookshelf, tidying up the desktop, organizing the shelves), cooking (washing vegetables, cutting vegetables, tidying up after meals, using kitchen utensils, making drinks, cooking), daily care (opening and closing doors and windows, passing items, taking out the trash, watering flowers, pet care), etc. Later, handicrafts such as home decoration, building blocks, embroidery, and paper - folding were added.
Zhang Yue receives tasks every day, finds a suitable shooting scene, opens the App, and does housework while using the grippers and recording. According to the requirements, she shoots for no less than 8 hours a day, and each video is no less than 2 minutes. After shooting, she uploads the videos in batches. The effective duration in 8 hours is less than half. Calculated at 30 yuan per hour, her daily income is about 120 yuan.
A Xin, a native of Hubei, remembers that as early as November 2025, sporadic recruitment information for embodied intelligence data collection began to appear in the part - time job group. By March 2026, the recruitment information for data collectors exploded. Some required going to a centralized venue to remotely operate robots, with a salary of 180 - 250 yuan per day. Some allowed working at home using wearable devices (headbands, grippers, etc.), with a salary of 120 yuan per day plus performance.
Since March this year, data collection companies have recruited crowdsourced data collectors on a large scale across the country through human resources outsourcing companies. For ordinary people who don't understand how robots work, data collection is a low - threshold and novel job. In low - tier cities, a daily wage of 120 yuan is also not bad.
Some people said that the data collection recruiters organized training in the village in March. She shot a half - hour video as required and submitted it, but later, because there were too few participants in the village, the recruiters in the village left with the gripper equipment. Others said that they had participated in the training but couldn't participate in the collection because there were not enough grippers. Some people even called on the Internet to give their retired mothers an opportunity to participate in this cutting - edge industry.
Meimei, an HR of a human resources outsourcing company, told Yibang AI that the company currently plans two phases of data collection projects. The first phase focuses on home scenarios, and the second phase, named "World Interaction", focuses on daily behaviors in outdoor public places, including cycling, walking, exercising in the park, shopping, strolling, picking up express deliveries, taking out the trash, walking the dog, etc. "Theoretically, all outdoor sports can be recorded, but to avoid camera shake, it is not recommended to record strenuous or competitive sports." Meimei reminded.
The recording requirements are to turn on the sound recording and record the environmental sound. Interact with the environment at least once every 3 minutes. Occasionally, other people can appear in the frame, but try to avoid long - term shooting of others because post - desensitization will be very troublesome.
A KFC clerk in Beijing is using a gripper to clean the table while collecting data
Overseas data collection is also bustling: companies such as Micro1 and Scale AI have recruited part - time workers globally to record videos of housework. Workers in countries such as Kenya, the Philippines, and India wear head - mounted cameras. DoorDash launched the Tasks app in March, allowing its delivery drivers to record housework videos on the side.
A spectacular wave of public data collection has begun.
Data! Data! Data!
The public data collection comes from the current data hunger of the embodied intelligence industry.
"The big demanders are in a state of 'the more you have, the more I'll buy, and I'll take it as soon as you have it'," said Yao Maoqing, the partner of Zhiyuan Robot, the chairman and CEO of Mifeng Technology. Embodied intelligence brain companies, embodied intelligence ontology manufacturers, and multimodal large - model and world - model companies all need data. Currently, the available datasets in the market are about hundreds of thousands of hours, and high - quality data is in serious short supply.
The mainstream view is that training an embodied large model with generalization ability requires at least 10 million hours of data, and achieving intelligent emergence requires 10 billion hours of data. Just like a human baby, from being born to being able to walk, talk, dress, eat, and do housework, observing, imitating, and practicing repeatedly in the real environment is the necessary way to acquire skills. Similarly, for robots to achieve the level of doing laundry, cooking, and cleaning at home like a nanny, they can't bypass this process.
Autonomous driving has also gone through the process of data from scratch and from less to more. The first - generation autonomous driving dataset nuScenes obtained 1000 manually - labeled scenarios, but the total duration was only 5.5 hours. Automobile companies represented by Tesla collect data through the mass - produced cars they have sold. This method with lower cost and larger output has allowed Tesla's intelligent assisted driving to accumulate about 10 billion kilometers of driving data. However, this process took nearly 10 years.
Obviously, compared with autonomous driving, the data accumulation of embodied intelligence has just started. He Hongling, the chief operating officer of DataTang, told Yibang AI that the training methods of embodied intelligence and autonomous driving are similar, but the difficulty lies in that cars can be sold first to accumulate data, iterate algorithms, and update intelligent driving capabilities. "There is no such logic in embodied intelligence. I can't buy a robot that can't do anything and then remotely operate it to work."
He Hongling said that according to the data demand situation he knows, home scenarios account for 80%, shopping malls account for 10%, and factories account for 10%. The reason is not difficult to understand. Long - standing needs such as housework, care, and elderly care urgently need technological progress to provide new solutions. The complexity, diversity, and corresponding privacy and security issues of tasks in home scenarios require robots to go through more learning and training. As for the shopping mall scenario, the current labor cost is still acceptable, and there is no strong motivation for substitution in the short term. Factories already have mature automation solutions, and the tasks are relatively standard, so the demand for embodied intelligence data is not large.
This is a chicken - and - egg problem: the accumulation of data requires robots to enter households on a large scale, while the usability of robots depends on sufficient training with home - scenario data.
The generally recognized embodied intelligence data can be roughly divided into three layers: the bottom layer is Internet videos and simulated synthetic data; the middle layer is non - ontology data obtained by data collectors using wearable devices to perform specific task actions; the top layer is real - machine data obtained by staff remotely operating robots. From the bottom to the top of this data pyramid, the quality is better, the quantity is less, and the cost is higher.
Before this year, in centralized data collection factories, using robot bodies of various brands to repeat various work tasks was the mainstream data collection method. More than 20 cities in China have established data collection factories. However, since March 2026, the non - ontology embodied intelligence data collection has erupted like a prairie fire.
Zhu Kai, the deputy director of the Tianji Laboratory of Ant Digital Technology, said that existing research has verified that there is a priority order for the diversity of embodied intelligence training data: task diversity > operation item diversity > scenario diversity.
"We have seen a common trend on the data demand side of the embodied cutting - edge models: the massive data demand is concentrating on non - ontology data - that is, ego/UMI data. The ratio of training data for the general embodied model is evolving towards '90% ego + 10% real - machine', and some more radical teams are even exploring the '99% ego + 1% real - machine' extreme ratio," Zhu Kai said bluntly. "This means that the rhythm on the data side determines the breakthrough node on the model side, and the large - scale supply capacity of ego data will directly determine how soon the GPT - 3 moment will come. So the question becomes: Who can continuously supply high - quality ego data on a scale of millions or even tens of millions of hours?"
The "Warring States Period" of data collection equipment
In the past two years, the robot data collection equipment has gone through three iterations: from the initial "real - machine remote operation" - that is, people use VR, gloves, and other devices to control a certain real - machine robot to learn tasks; to UMI (Universal Manipulation Interface) - using a general gripper with a GoPro camera and allowing collectors to operate repeatedly and slowly in real scenarios to record task action trajectories; to the "first - person human video (Ego Centric)" that has emerged in the past two months, which only requires a mobile phone or camera to record the daily operations of hands for training the robot brain.
Each iteration is in the direction of lower cost, lower threshold, and more convenient collection, so it also spreads the participants from professionals to the general public.
The UMI route started at the end of 2025. Overseas embodied intelligence manufacturers have successively trained models such as Generalist's GEN - 0 and Sunday's ACT - 1 through UMI data collection, initially proving that this path can work. Domestic and foreign manufacturers have quickly followed up, and various UMI devices have emerged one after another, such as grippers, wrist - mounted cameras, gloves, headbands + mobile phones, headbands + grippers, etc.
In March this year, Luming Robot released the FastUMI non - ontology data collection product system, including the gripper - type hardware FastUMI Pro, the backpack - shaped data collection device FastUMI Go, the head - mounted hardware FastUMI Ego, and the 6 - axis collaborative robotic arm FastUMI Touch.
Luming completed the collection of 100,000 hours of data through its self - built data collection station in 2025. According to Zhao Guangzhi, the co - founder of Luming, next, Luming's data collection will be carried out in two steps: first, in 2026, cooperate with the government/industry parties to build data collection factories to achieve a data production capacity of 1 million hours; second, in 2027, achieve a data production capacity of 10 million hours in the form of crowdsourcing incentives.