HomeArticle

This robotics company stuffed "embodied data" into 10,000 backpacks.

邱晓芬2026-02-12 20:05
When it comes to the end of competing in model architectures, what everyone is actually competing for is not the model architecture itself, but the quality of the model data. This is a common consensus in the industry.

Author: Su Jianxun

In the field of embodied intelligence, "data collection" might be one of the few common understandings.

Relying on the training of massive amounts of data, the large language model gave birth to ChatGPT, and the "Scaling Law" has also become the belief of AI practitioners. However, in the physical world where embodied intelligence belongs, there is no massive amount of data on the Internet for reference. Whether it is humans or robots, the amount of real - world data is insufficient to reproduce the GPT moment.

Therefore, how to collect data, how much data can be collected, and how to ensure the quality of data have become one of the most important tasks for practitioners in the field of embodied intelligence at present.

Recently, a robotics company wants to do something innovative in data collection. Luming Robotics has launched the world's first backpack - style UMI data collection device, FastUMI Pro (backpack version), and plans to deploy 10,000 devices in 2026 to conduct systematic data collection in six real - world scenarios, including industry, households, hotels, restaurants, shopping malls, and offices.

The world's first backpack - style UMI data collection device: Luming FastUMI Pro (backpack version)

Let's briefly explain "UMI (Universal Manipulation Interface)": UMI is a low - cost data collection and learning framework jointly proposed by Stanford University, Columbia University, and the Toyota Research Institute. Different from the remote - operation data collection methods of its peers in the market, UMI can be decoupled from the robot body. This means that the trained data can be applicable not only to a specific robot or a certain type of robot form.

At a media communication meeting in early 2026, Yu Chao, the founder and CEO of Luming Robotics, also talked about the efficiency and cost comparison between UMI and remote operation:

"For the same task like folding clothes, remote - operation data collection takes 50 seconds and costs 3 - 5 yuan. If using the FastUMI Pro for data collection, it only takes 10 seconds and the cost is less than 0.6 yuan. In this way, the collection efficiency can be greatly improved and the cost is lower."

Luming Robotics was founded in September 2024. Its founder, Yu Chao, was the former person - in - charge of the embodied robot business at Dreame and has nearly 10 years of R & D experience in embodied robots. He led the R & D and mass - production of thousands of units of Xiaomi CyberDog. Co - CTO Ding Yan was one of the earliest people in the Chinese mainland to work on UMI and was the first to bring UMI from the laboratory to the industrial sector.

Quantity and Quality

In 2025, Luming achieved a data production capacity of 100,000 hours through its self - built data collection center. Yu Chao predicts that in 2026, the data scale of leading embodied models is expected to start from 1 million hours.

The most important goal of Luming in 2026 is to establish an annual UMI data collection capacity of one million hours. This means that Luming needs to use more large - scale methods to collect more data.

"Robot training data should not be so expensive and scarce. The data generated by humans during their operations in the physical world are everywhere, but they have not been well collected." Yu Chao said.

The backpack - style FastUMI was born to solve this problem. It is a portable standard data collection workstation that can efficiently convert real - world operations into high - quality training data.

Previously, most embodied data collection relied on laboratories or single - scenario collection, which led to a problem: robots often repeated a few actions in only one scenario during data collection. As a result, the obtained data would lack diversity, which would also affect the generalization ability of the model.

Therefore, Luming Robotics hopes to adopt a more portable data collection method by putting the collection tools directly into a backpack, making it easier to collect data in real - world scenarios.

In terms of specific scenarios, Luming Robotics hopes to cover six core scenarios, including industry, households, hotels, restaurants, shopping malls, and offices, and subdivide them into 30 small - category tasks to build a structured and multi - dimensional operational data system.

The integrated closed - loop ability of "collection - training - inference" is the core of Luming's data infrastructure. The launch of this large - scale data collection relies on this fully connected infrastructure system: Relying on the FastUMI Pro, Luming's dual - arm embodied robot MOS completed the full - process verification of factory quality inspection from "data collection - strategy training - model inference" within 5 hours; after the FastUMI Pro was deployed in Hefei, it completed the collection, training, and deployment inference in real - world scenarios in only 7 hours.

FastUMI Pro completed the closed - loop of "data collection - strategy training - model inference" in the part - sorting task

Data First for Model Training

In addition to the backpack - style collection tools, Luming also built a "data supermarket" to turn the collected data into standardized and tradable products, allowing customers to directly purchase standardized operational data on the official website. It can be seen that as an embodied intelligence company, Luming's current corporate strategic focus revolves around "data".

Behind Luming Robotics' series of actions actually reflects the most urgent business needs in the field of embodied intelligence at present.

At the media communication meeting at the beginning of the year, Ding Yan, the co - CTO of Luming Robotics, shared his insights on data and models with media such as "Intelligence Emergence".

"I have a background in model development. I used to train models all the time. When we were doing it, we found a big problem," Ding Yan said. "To train a very good model, you must have a good data pipeline, including data production, data evaluation, and data screening. The process of building it itself takes time."

After understanding the real development situation of the industry, Ding Yan and his team decided that if they had to choose between models and data, they would definitely choose data first and not start training models immediately.

"Because in the end, when competing in model architecture, what we are actually competing for is not the model architecture but the quality of the model data. This is a common understanding in the industry," Ding Yan said.

The upper limit of the capabilities of embodied intelligence highly depends on the scale and quality of real - world operational data. When general - purpose data can be ordered online like hardware, the threshold for industry model training will be significantly lowered, and embodied intelligence can move from customized exploration to engineering - based production.

From "simultaneous data collection by 10,000 devices" to "online ordering of general - purpose data", Luming is turning the "ubiquitous but uncollected" operational data in the physical world into standardized infrastructure that can be supplied on a large scale and is building a data - driven ecosystem based on it. When data is no longer scarce, robots will truly become general - purpose.

end