Reconova's Robots Carrying Luggage at Airport in Rapidly Developing Embodied AI Track

Reconova is no longer just a visual intelligence company.

While robots are racing on various tracks, an embodied intelligence company has carved out its own path, evolving from visual intelligence to the field of embodied intelligence.

On April 29th, at the scene of the 3rd China Embodied Intelligence and Humanoid Robot Industry Conference, Raywin Technology delivered a keynote speech on breaking the deadlock in the scenario-based implementation of embodied intelligence.

This enterprise, which has been deeply involved in the AI field for 14 years, has officially sent a signal to the outside world: machines that can understand the world are now starting to take action. In a track where everyone is talking about generalization and scale, it aims to be a player that focuses on practical implementation.

From Visual AI to Embodied Intelligence

In 2012, Raywin Technology was established. So far, it has fully experienced two distinct AI eras.

In the AI 1.0 era, the core proposition of technology was perception: how to enable machines to “understand” images, recognize objects, and comprehend scenes. This was the golden decade for the large-scale implementation of deep learning and also a frenzied decade for visual AI companies to expand their territories.

Over the decade, the visual AI track underwent a brutal elimination process. At its peak, there were thousands of companies in China labeled as “AI vision”. Capital poured in crazily, and valuation bubbles piled up rapidly. Subsequently, there was a long process of deflating the bubbles: the financing environment tightened, commercialization was difficult to achieve, and homogeneous competition squeezed profits. Around 2019, a large number of players got into trouble one after another. It was no longer news that once-unique unicorns were sold at a discount or even shut down.

In the past visual AI field, security and finance were the two most concentrated arenas in the industry. In contrast, Raywin focused on relatively inconspicuous scenarios: passenger passage centered on civil aviation airports, commercial real estate mainly in shopping malls, and auxiliary safety driving for freight commercial vehicles.

From an external perspective, this was a somewhat restrained decision. However, precisely because of this, Raywin became one of the few visual AI companies that survived from the small model era to the large model era and still remains at the forefront of the industry in a highly competitive industry.

The benefit of focusing is that the moat becomes deeper. According to Frost & Sullivan data, in terms of 2024 revenue, Raywin ranks first in the visual intelligence product market for civil aviation enterprises in China, with a market share of 8.9%. Its products cover one-third of domestic civil airports, and the coverage rate is as high as two-thirds in large hub airports with an annual passenger throughput of over 10 million. Behind this lies billions of scenario-specific trainings, in-depth understanding of various business links in civil aviation, and long-term customer relationships established with airport operators.

In the AI 2.0 era, the technological proposition has changed. The large model not only brings an improvement in perception ability but also an extension from understanding to action. For Raywin, this technological inflection point is precisely the right time to take a step forward.

Zhan Donghui, the founder and chairman of Raywin Technology, welcomes this change: “In the past 12 years, we have been working on the ‘eyes’ – perceiving and understanding the physical world through vision. But now we are moving forward, towards the ‘brain’ and ‘hands’. On the basis of understanding the world, we start to make some decisions, take some actions, and help people get things done.”

This also means that Raywin is no longer just a visual intelligence company. It is extending its technological focus from perception and cognition to decision-making and execution, forming a complete closed-loop from the “eyes” to the “brain” and then to the “limbs”. In terms of product positioning, it is shifting towards becoming a provider of embodied intelligence products for commercial scenarios and complex operations. This is Raywin's new label and also the specific track it has chosen in the popular field of embodied intelligence.

The Real Moat in a Noisy Track

The most mainstream narrative in embodied intelligence at present is generalization. The more scenarios a robot can adapt to, the more appealing its story is, and the greater its valuation potential. Under this logic, companies focusing on vertical scenarios seem to be at a natural disadvantage in the narrative.

Zhan Donghui believes that the general ability is the stage for platform-type companies, which requires scale, ecosystem, and the first-mover data network effect. However, the barriers in vertical scenarios are never built by simply piling up parameters. They come from specific scenarios, in-depth understanding of customers' business processes, and the know-how accumulated after countless times of solving problems with customers, which cannot be achieved by simply increasing computing power.

In the technological dimension, Raywin has built a competitiveness matrix consisting of a three-layer architecture.

The first layer is the perception base. This is a direct transformation of 14 years of visual algorithm accumulation: object recognition, spatial understanding, pose estimation, and real-time perception in unstructured environments.

The second layer is the decision-making layer, with the VLA (Visual - Language - Action) large model as the core self-developed direction. Raywin is building a VLA model for vertical scenarios, integrating visual perception, natural language understanding, and robot motion planning into an end-to-end framework. This enables the robot to become an intelligent agent that understands scene semantics, makes judgments based on context, and generates corresponding action sequences. Compared with the general VLA model, Raywin has further introduced force and tactile senses, making the robot's behavior decision-making closer to the multi-dimensional information unified decision-making mechanism of humans. Raywin named this innovation VTFLA.

The third layer is the execution layer, which is the supplementation of the “hands” and “body” capabilities by self-developed execution components. No matter how strong the perception and decision-making abilities are, they ultimately depend on the quality of physical actions. Raywin's self-developed investment on the execution side addresses the problem of reliable operation of robots in unstructured environments, including grasping strategies, force control, and the adaptation of end-effectors to different object shapes. This is an extremely high engineering threshold and also the most difficult gap to cross between demonstration and mass production deployment.

In terms of judging the commercialization path of embodied intelligence, Zhan Donghui believes that complex and unstructured dedicated scenarios will achieve commercialization before general scenarios.

General robots face dual constraints of technology and cost. They need to have sufficient generalization ability and also reduce the single-machine cost to the acceptable procurement threshold of enterprise customers. Meeting these two conditions simultaneously still requires time at the current stage. In contrast, dedicated robots deeply adapted to a single scenario can be fully optimized technically for known constraints and are more commercially feasible in terms of cost structure.

The Underestimated Tough Nut

Civil aviation is the first entry point for Raywin into embodied intelligence and also the foundation with the deepest accumulation. The first implementation scenario Raywin found is baggage handling.

Baggage handling has always been one of the most labor-intensive links in the civil aviation industry. Difficulties in recruitment, high staff turnover, and large fluctuations in manual efficiency affected by weather and flight schedules have been problems that have plagued airports for many years.

Actually doing a good job in this scenario is much more difficult than it seems. Zhan Donghui said that the baggage handling area is a highly unstructured operating environment, which almost concentrates all the most unfavorable conditions for robot deployment.

Firstly, there is an extreme diversity of object shapes: Passengers' checked luggage is not standardized. Suitcases, canvas soft bags, cartons, and oversized irregular items often appear mixed in the same batch. Every piece of luggage the robot faces is a new grasping challenge. It needs to determine where to grasp, what kind of force to use to ensure the luggage is stable and undamaged, and finally find the best position for stacking.

Secondly, there is the irregularity of the space itself: The transfer area underground in the terminal building is not designed for robots. The widths of the corridors vary, and the gaps between equipment are narrow. The robot's movement path needs to be planned in real-time.

Finally, and most importantly, there is a high demand for high-density human-robot collaboration: In the civil aviation operation system, the accuracy and timeliness of baggage handling are directly related to flight punctuality and passenger satisfaction. To complete the transfer of all flight baggage within a short time, human-robot collaborative operation is currently the best way. However, parallel operation means that there will be high-frequency spatial intersections between the two parties at close range. Any delay in perception or decision-making may cause safety risks.

This is exactly why general robots cannot operate in this area at present. The strong generalization ability of general robots means they can “be used” in multiple scenarios. However, “being usable” and being stably usable in a harsh production environment are two completely different standards. At the same time, the current cost structure of general robots also determines that they cannot achieve an acceptable ROI in such labor substitution scenarios for the time being.

Raywin's solution is to develop an intelligent robot specifically designed for the airport baggage handling scenario. At the 2025 International Airport Expo, in the simulated terminal transfer area, the Xiaoyi Baggage Handling Robot smoothly transferred pieces of luggage of various shapes from the end of the sorting system to the downstream baggage trailers and efficiently completed the stacking, bridging one of the weakest links in the intelligentization of the civil aviation system.

One of the core designs is the industry's first human-robot collaborative operation mode. On the premise of full communication with customers, through engineering design, seamless collaboration between robots and humans is achieved, allowing people to work safely and naturally side by side with machines. Robots are responsible for high-frequency and heavy physical handling and stacking, and humans intervene and supplement outside the robot's ability boundary. Both parties perform their respective duties, and the overall efficiency far exceeds that of pure manual work.

Zhan Donghui said that in the actual airport project test, the Xiaoyi Baggage Handling Robot has significantly reduced the dependence on the number of human labor and the labor load of manual workers. At the same time, it has increased the system throughput by 30% and reduced the baggage damage rate to 0.12%. This will also become one of the driving forces for airport operators to make purchases.

Currently, Raywin is conducting actual tests at multiple airports and plans to officially implement commercialization in the second half of this year. While expanding in the domestic market, Raywin has also included the civil aviation markets in Southeast Asia and the Middle East, which have similar baggage handling pain points, in its overseas expansion vision.

In this booming embodied intelligence track, Raywin has chosen a more specific path: doing a difficult thing well and allowing customers to see quantifiable value in actual business scenarios.

If we were to find a position for Raywin in the current embodied intelligence industry landscape, it is neither a general robot company nor a traditional visual AI company. Instead, it is an embodied intelligence product provider focusing on handling complex scenarios and complex actions.

The craze in the robot track will eventually subside, but products that have been verified in harsh scenarios will not. In the midst of the noise, choosing to focus narrowly and deeply requires perseverance. It is precisely this choice that has enabled Raywin to occupy a truly rare ecological niche in the most noisy craze of embodied intelligence and become an embodied intelligence company that people look forward to.

This article is originally produced by「晓曦」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

In the rapidly developing embodied AI track, Reconova's robots are already carrying luggage at the airport.

From Visual AI to Embodied Intelligence

The Real Moat in a Noisy Track

The Underestimated Tough Nut