HomeArticle

Unveiling the Embodied Data Industry Chain: The Road to Unicorn for a Data Company

IT桔子2026-06-10 15:53
Industry benchmark Lunary Intelligence has grown into a unicorn in just three years since its establishment, with a valuation exceeding 1 billion US dollars.

Recently, I came across a short video report like this: "A stay-at-home mom works from home and earns 3,000 - 4,000 yuan per month. She teaches robots to fold clothes and wipe tables."

This piece of news piqued my interest, not only because it represents an emerging occupational gap - an embodied intelligence data collector. More importantly, from the perspective of the entire industrial chain, what is the value of this job, and how does capital price it?

I conducted research and analysis based on reports from mainstream industry media. The results show that there is a more than 10 - fold difference between the hourly wage of embodied intelligence stay - at - home collectors and the final selling price of the data they create.

However, this is not a worry - free business, and there are still challenges in the future.

Understanding this business logic and the underlying industrial chain means understanding who is the core role in the ecosystem, and how a company focusing on embodied data - Guanglun Intelligence - has rapidly grown into a unicorn in three years. It is recommended that every entrepreneur interested in the embodied intelligence track learn about it.

I. The Panorama of the Industrial Chain: Four - layer Structure, Who is in Which Position?

The embodied intelligence data industrial chain can be divided into four layers.

The bottom layer is the collection layer - laborers who collect and provide basic data on physical actions.

There are mainly four types of people in this layer:

Stay - at - home collectors: The moms in the news reports wear gripper devices and fold clothes and wipe tables in their living rooms. The advertised hourly wage is 30 yuan, but the actual effective hourly wage is about 17 yuan, with a monthly income of 3,000 - 4,000 yuan.

Site collectors: Junior college graduates who work full - time in the data collection center and wear motion capture equipment. Their daily wage is 180 - 250 yuan, which is equivalent to an hourly wage of 22 - 31 yuan.

Pre - training field for embodied intelligence robot data collection in Hefei. Source: Internet

Real - machine remote operators: Professional technicians wear force - feedback gloves and teach robots to complete precise operations hands - on. Their daily wage is over 300 yuan, and the hourly wage is over 75 yuan. This is the most accurate and most costly collection method.

Picture of real - machine remote operation of robots. Source: Internet

UMI collectors: They wear non - physical collection devices like Lumion FastUMI Pro. There is no need for the robot body to participate, and the cost is reduced to 1/5 of that of remote operation, which is suitable for more refined data collection.

The second layer is the platform layer - an intermediary connecting collectors and data companies.

As an intermediary, the core role of the platform layer is the organizer, which connects supply and demand, manages processes, and even provides physical space and operation scenarios, and takes a cut from it.

In May this year, JD.com built the first national embodied intelligence data collection community in Suqian, planning to mobilize more than 100,000 employees and 500,000 industry personnel. In addition, there are countless third - party outsourcing teams and small organizers.

The outsourcing platform takes orders from data companies and subcontracts them to collectors, taking a 30% - 50% difference in the middle.

In addition to the outsourcing platform, the operation of this industrial chain also requires some infrastructure builders who provide equipment to capture the collection operation scenarios.

For example, equipment providers such as Lumion Robotics, Mifeng Technology, and Lingyun Optics research, develop, and produce hardware. A set of motion capture equipment costs 100,000 - 500,000 yuan, and a set of UMI equipment costs about 2,800 yuan. Their profit model is very easy to understand: equipment leasing/selling.

The third layer is the data layer - the core player in the entire industrial chain, which can be called an "alchemist" that transforms data into assets.

The representative enterprises are Guanglun Intelligence, Zhiyu Jishi, Tashi Zhihang, and Mifeng Technology.

What these companies do is to clean, annotate, align, and simulate - enhance the raw data collected from the bottom layer, package it into a trainable data product, and sell it to downstream customers.

The fourth layer is the application layer - the customers who pay for the data.

There are three types of representative enterprises:

The first type is humanoid robot body companies, such as Unitree, Ubtech, Zhiyuan, Galaxy Universal, Tesla, etc., which need real - machine data to train models.

The second type is world model/large model teams, typical enterprises such as Google DeepMind, NVIDIA, World Labs, which need human behavior data to understand the physical world.

The third type is industrial application parties, such as factories, logistics, and medical care, which need scenario - adapted data.

Taking a peek at the entire picture of the embodied intelligence data industrial chain, it is a classic "pyramid model": the bottom layer consists of a large number of low - cost laborers, the middle layer includes platforms that take a cut and companies that sell equipment, and the top layer is data companies that control data assets and resale capabilities.

And the position of collectors is very clear: they are the fuel for the entire chain.

II. Data Selling Model: The Gap from 17 Yuan to 300 Yuan

Next, let's see how the core links of the industrial chain make money.

First, calculate the collection cost.

The effective hourly wage of stay - at - home collectors is 17 yuan. The daily wage of site collectors is 180 - 250 yuan. Calculated on an 8 - hour basis, the hourly wage is 22 - 31 yuan.

The cost of the UMI collection solution is 1/5 of that of traditional remote operation. Lumion's FastUMI Pro reduces the collection time of a single piece of data from 50 seconds to 10 seconds, and the estimated hourly wage cost is about 55 yuan.

The cost of real - machine remote operation is the highest. After large - scale operation, the effective data cost per hour is about 275 yuan (equipment depreciation + labor + scenario). The industry says that it can reach thousands of yuan for small - scale collection.

Then, calculate the selling price.

A survey by The Paper in May 2025 gave the industry pricing range: the overall pricing of embodied intelligence data is 200 - 500 yuan per hour. Among them, real - machine data is the most expensive, with a market price of 500 - 1,000 yuan per hour.

Yao Maoqing, the CEO of Mifeng Technology, revealed that the price of non - physical data that does not rely on a specific robot body will eventually converge to one - half to one - third of the real - machine data - that is, 300 - 400 yuan per hour.

Now, calculate the difference.

What does this data indicate?

The more "low - end" the collection method is, the greater the difference multiple is.

The real - machine remote operator gets 275 yuan, and the terminal selling price is 800 yuan - only 2.9 times. The stay - at - home collector gets 17 yuan, and the data company can sell it for 300 yuan - there is a 17.6 - fold profit difference.

The 283 - yuan difference between the 17 - yuan hourly wage and the 300 - yuan selling price is divided up layer by layer by platform commissions, data company's technical processing, equipment depreciation, and the resale premium of data assets.

But this is not the real profit source of data companies.

III. The Magic of Guanglun Intelligence: Selling Not Once, but Ten Times

Guanglun Intelligence is a benchmark enterprise in this field. It has rapidly raised funds and grown in just three years since its establishment and became a new unicorn enterprise this year, with a valuation of over 1 billion US dollars.

According to the data disclosed by the official, as of early 2026, it has cumulatively delivered over 1.5 million hours of high - quality human data, covering 25,000 environmental nodes and 100,000 types of tasks. In the first quarter of 2026, the new orders reached 550 million yuan.

Do a rough calculation: 550 million ÷ 1.5 million hours = an average selling price of about 367 yuan per hour. It seems to be at the industry level, and the profit margin is not very high?

This calculation has a key assumption that these data can only be sold once. In fact, it is not the case.

Guanglun Intelligence summarizes this ability as the "data resale rate", that is, how many different customers and task requirements the data per unit hour can serve.

Yang Haibo, the co - founder of Guanglun Intelligence, said: "For data in high - quality scenarios, the resale rate has exceeded 10 times."

What does it mean?

The same piece of data is not just sold to one customer and then it's over.

It can be sold to Unitree, Ubtech, Zhiyuan, Galaxy Universal... Each time it is sold more, the marginal cost is almost zero (only some format conversion and scenario adaptation are needed), but the income is real.

This is the real business model of data companies: make a one - time investment in collection costs, and then infinitely dilute the marginal cost through resale.

The essence of data is the same as that of software - the replication cost approaches zero. Each time it is sold more, the gross profit margin jumps up.

This logic explains why the capital market values Guanglun Intelligence at 10 billion yuan. It's not because it has 1.5 million hours of data, but because these 1.5 million hours of data may be sold 15 million times.

IV. Under the Trillion - yuan Market Scale, Is Data Selling Sustainable?

Data shows that in 2026, the market scale of China's embodied intelligence is expected to exceed 1 trillion yuan, of which the data service accounts for more than 15%, and the market scale will reach about 150 billion yuan.

However, it is still unknown what proportion the third - party data resale model accounts for.

But it can be foreseen that there are still some hidden worries in this kind of business model.

Leading robot manufacturers have long realized the importance of data construction and have started to build self - research centers. For example, Zhiyuan Robotics established the embodied intelligence data platform Mifeng Technology in 2026, and Unitree Technology will independently build a large - scale real - world data set with the funds raised from this IPO. Their demand for purchasing third - party data will decline.

In addition, from basic simulation data, general - scenario interaction data to some real - machine annotated data sets, some mature data are gradually being freely opened by leading enterprises and research institutions. These open - source data also have a certain impact on the data selling model.

For example, at the end of March this year, Unitree Technology announced that the high - quality whole - body remote - operation real - machine data set UnifoLM - WBT - Dataset of humanoid robots was officially open - sourced, covering 340 hours and a total of 1.89 million action trajectory data.

However, the volume of global open - source data is still small at present, and no large - scale effect has been formed.

The core competitiveness of third - party data service providers comes from data accumulation covering multiple scenarios.

However, for embodied intelligence to be implemented in complex industrial scenarios, what is needed is not laboratory data but real industrial field data. If leading robots can directly obtain first - hand data on factory real - machine interaction and dynamic scenarios through cooperation, the cost - performance advantage of third - party data will be continuously weakened.

In the future, the third - party data selling model may gradually shrink in two directions: one is to serve small and medium - sized robot manufacturers that do not have the ability to self - research data, and the other is to provide data for niche and segmented scenarios that are difficult to cover by themselves.

This article is from the WeChat official account "IT Juzi" (ID: itjuzi521), author: Wu Meimei, published by 36Kr with authorization.