$41.8 million seed round explodes in popularity: South Korean physics AI tears open the real implementation logic of video pre-training
In 2026, the debate over the commercial value of VLA video pre - training in the global embodied intelligence track continues to intensify.
On one hand, there is a high - pitched financing enthusiasm in the market, and on the other hand, there are continuous skeptical voices. Many viewpoints believe that training robots with Internet videos is just a temporary gimmick and difficult to support the real industrial implementation. However, a financing deal of the South Korean startup RLWRLD provides a more concrete observation sample for the technical route and commercial value of the entire track.
Recently, RLWRLD has accumulated a seed - round financing of 60 billion South Korean won, equivalent to $41.8 million. For a physical AI enterprise still in the early stage of technology polishing, such a financing scale is quite remarkable. What's more noteworthy is that all the funds in this round come from real - industry capital in Japan and South Korea, with LG, SK, Lotte, CJ Logistics, and ANA Airlines all involved.
The continuous investment of industrial capital represents the real judgment of the front - line industrial sector on the value of technology implementation. Through the development path of this South Korean enterprise, we can understand the adaptation logic of video pre - training in different scenarios and re - sort out the real value of different technical paradigms in the current track.
Understand the scenario - stratification logic of video pre - training from the implementation practice
In the past, most discussions about video pre - training in the industry stayed at a single dimension, simply judging whether the technology was useful or just a hype concept. The implementation model of RLWRLD makes the stratification characteristics of the track gradually clear.
The mainstream video training methods in the market are divided into two completely different implementation forms, suitable for different development stages and commercial scenarios.
Many domestic embodied AI startups generally choose to capture publicly available short - videos across the network to complete the underlying pre - training of the model. A large number of home, life, and industrial clips can be obtained for free, and the data volume is large enough to quickly enable the model to establish basic visual cognition and action logic.
This method is more suitable for the early cold start of enterprises. Start - up teams can quickly iterate the model and complete multi - scenario tests without investing high costs in collecting real - machine data, and can adapt to multi - industry pilots and technical verification work.
However, it should be noted that there are many natural flaws in publicly available network videos. The picture perspective, shooting light, and staged shooting will all bring data noise. Moreover, the videos record human limb movements, which deviate from the mechanical structure and force - bearing logic of robots. In high - precision and high - stability industrial operation scenarios, the performance is often not stable.
According to "Wall Street Tech Eye", RLWRLD has chosen another method closer to the industry. The team abandons publicly available network data and delves deep into vertical real - world scenarios. They have long - term stationed in offline positions such as hotels, logistics warehouses, convenience stores, and airline logistics, and recorded the real operation processes of on - the - job employees through wearable devices.
These original positions are relatively troublesome, but the advantage of the real - shot videos is that there is no redundant picture interference, and they are completely in line with the real working environment and standard operation actions. The model no longer learns fragmented network pictures but the complete operation logic of front - line mature positions.
This kind of data just matches a large number of repetitive and standardized fragmented tasks in the service industry. Robots do not need to transform the on - site environment or reprogram repeatedly. They can adapt to the original work process and quickly undertake tasks such as sorting, organizing, storing, and simple auxiliary work.
The two video training models correspond to two types of market demands. Network - wide videos are suitable for quickly building model cognition and expanding scenario boundaries, while real - world position videos are more suitable for precipitating stable, payable, and mass - producible commercial implementation capabilities. The scenario differences directly determine the implementation effect and commercial value of the technology.
The underlying reasons why the real - world video route can work
A seed - round financing of over $40 million is probably not just a capital sentiment hype. More precisely, this implementation model fits the current industrial reality in Japan and South Korea, and has formed a self - consistent operating logic in terms of demand, technology, ecosystem, and business model.
The labor shortage caused by the aging population in Japan and South Korea has penetrated into various sub - positions in the service industry.
Looking at different scenarios, basic positions such as hotel room maintenance, warehouse unpacking and sorting, supermarket shelf arrangement, and airport logistics assistance have high personnel mobility, rising labor costs year by year. These repetitive and low - value - added jobs have long faced the problem of difficulty in recruiting.
Traditional automation equipment is very rigid and can only adapt to fixed production lines. It cannot handle the flexible and changeable operation scenarios in the service industry. The industry has long lacked low - cost flexible automation solutions.
RLWRLD's real - world training model just fills this market gap. Robots trained with real - position data can adapt to real offline operation environments. The implementation threshold is low, and the transformation cost is controllable. Enterprises can complete automation upgrades with lower labor replacement costs, and their willingness to pay and demand for implementation are real.
At the technical level, real - world video training avoids the implementation deviation problem commonly existing in the industry. Models trained with network - wide videos often have the problem of understanding the picture but making mistakes in actual operation. The core reason is that the physical logic of human actions does not match that of mechanical hardware. The model can only reproduce the visual appearance and is difficult to adapt to real physical operation rules.
According to "Wall Street Tech Eye", RLWRLD takes the original operation videos of front - line employees as the core data and combines physical parameters such as force sense and motion trajectory for training optimization. The model learns standardized and reusable position action processes, greatly improving the operation stability and reducing the need for manual remote intervention. The reliability of technology implementation is more in line with industrial requirements.
The ecosystem and business model further amplify its implementation advantages. From past investment records, it can be seen that RLWRLD's investors are basically industrial giants with a large number of offline scenarios. They directly open their own business scenarios while investing, providing implementation pilots and real business orders. Enterprises do not need to spend a lot of costs to develop the market. They have a stable data source and income scenarios from the early stage of establishment. They can implement projects and iterate the model at the same time, forming a continuous positive cycle.
At the same time, the company focuses on the research and development of the algorithm brain and does not involve heavy - asset hardware manufacturing. All funds and manpower are concentrated on model optimization and data system construction, with higher capital utilization efficiency and a more stable commercialization rhythm.
With the superposition of multiple factors, this real - world video route focusing on vertical scenarios has formed a complete commercial path that can be implemented, monetized, and iterated in the Japanese and South Korean markets.
Re - examine technology selection and industry disputes
RLWRLD has proved with a large number of industrial implementations that video pre - training has real industrial value. However, the market's perception of this technical route has long been interfered with by another lighter and lower - cost training paradigm, resulting in huge industry differences.
Different from the South Korean enterprise's path of delving deep into real - world videos, another mainstream approach has emerged in the domestic physical AI track, with network - wide publicly available video pre - training as the core foundation. Among them, Qianxun Intelligence is the fastest - growing, most - concerned, and most - controversial leading enterprise on this technical route.
Qianxun Intelligence was jointly founded by Han Fengtao, the former CTO of Luoshi Robotics, and Gao Yang, an algorithm expert with a Berkeley background. It has quickly become a popular startup in the domestic embodied intelligence field within two years of its establishment. The core technical logic of the company belongs to the video - driven physical AI system, the same as that of RLWRLD, but the choices are completely different.
Qianxun relies on a large number of publicly available network videos to complete the general pre - training of the model, and then makes scenario fine - tuning through self - developed wearable devices and industrial tele - operation data. It takes the self - developed full - stack hardware and software route, self - develops the whole body of humanoid robots, and focuses on implementing in domestic industrial scenarios such as power batteries and high - end manufacturing.
One South Korean and one Chinese, one real - world and one network - wide, one pure algorithm and one full - stack hardware, these two homologous but completely opposite paths just constitute the two major samples in the current video pre - training track. Considering the domestic industrial environment, Qianxun Intelligence's route selection has sufficient local rationality.
The domestic manufacturing industry has a wide variety of categories and highly fragmented scenarios. The factory demands generally feature small - batch, multi - category, and fast - iteration. There is no unified standardized operation process in the industry. If we copy RLWRLD's model of collecting real - shot videos scene by scene and position by position, the overall cost will be too high and the expansion speed will be slow, making it difficult to adapt to the scale and complexity of the domestic industrial market.
Relying on publicly available network videos for underlying pre - training is the most cost - effective and fastest - expanding cold - start method for domestic robot enterprises. It can quickly enable the model to accumulate general world knowledge, adapt to a variety of non - standard industrial scenarios, and quickly produce benchmark implementation projects.
At the same time, Qianxun does not completely rely on external network data. By building its own real - machine data system to make up for the scenario deviation and superimposing the advantages of the domestic complete robot supply chain, the full - stack self - developed model also retains the growth space for long - term hardware mass production.
However, compared with RLWRLD's mature, stable, and sustainable monetizable business path, the shortcomings of Qianxun Intelligence's route are also very prominent, which is also the core reason for the continuous skeptical voices in the industry.
The data attributes of network - wide videos determine that the model is good at generalization trials but difficult to precipitate in - depth and standardized operation capabilities in a single industry. As a result, there are many pilot projects but few large - scale paid implementations. There is a long - term lack of stable cash - flow support, and high valuations are prone to accumulate bubble disputes.
The problems of perspective deviation, picture noise, and body misalignment between humans and machines in external network videos can never be completely eradicated. In complex industrial scenarios and high - precision flexible operations, robots still need manual remote intervention and assistance, and there is an obvious gap in autonomous stability compared with the real - world training route.
Meanwhile, the full - stack self - developed model makes the company bear the high costs of algorithm, hardware, and large - scale data teams at the same time, resulting in a faster money - burning speed. However, the mass - production rhythm of the whole machine is relatively cautious, and the commercialization realization cycle is longer. In the current situation where the capital market is generally becoming more rational, faster implementation results are needed to prove the technical value.
This article is from the WeChat official account "Wall Street Tech Eye". Author: Park Jin - tak, Editor: cc Sun Congying. Republished by 36Kr with permission.