HomeArticle

The next trillion-level opportunity: Physical AI is redefining urban services on the streets.

晓曦2025-12-05 11:14
Starting with the end in mind, behind the sanitation robots lies Coolwalk Robotics' technological ambition.

In the past two years, the global spotlight has been on generative AI and humanoid robots. The former has reshaped the way humans interact with information, while the latter is regarded as the key carrier to achieving general intelligence.

However, beyond the hype of technological narratives, robots that can truly be deployed and create real - world productivity have quietly started operating on city streets. They appear on the curbs, under overpasses, and in the blind spots of sidewalks - places that are easily overlooked but full of dust and danger. They start working at five in the morning, replacing thousands of labor - intensive, arduous, and even risky jobs.

In China, the sanitation industry is large - scale and has prominent pain points: it is labor - intensive, has difficulties in recruiting workers, has high safety risks, and requires strict operation standards. But precisely because of these factors, the seemingly unglamorous sanitation scenario can truly measure the value of intelligence.

COVA Robotics chose to start from the sanitation scenario based on this logic. "COVA's approach is quite practical. We focus on urban services, starting with sanitation, and deeply engage in the 'Dirty Work' in cities, releasing the labor force through robots," said Liao Wenlong, CTO of COVA Robotics.

In October 2025, COVA Robotics launched the small robot R0 with dual - arm operation capabilities. It can not only undertake municipal operation tasks but also enter more complex scenarios such as property management. Meanwhile, COVA won the "Enterprise with Breakthrough in AI Application Scenarios of the Year" at 36Kr WISE 2025 and won the championship in the Shenzhen International Artificial Intelligence Sanitation Robot Competition, which verified both its scenario deployment and technological innovation.

A list of COVA's awards in recent international sanitation robot competitions

That is to say, nowadays, COVA's robots can not only clean the streets but also handle more complex tasks. From autonomous sanitation vehicles to municipal embodied intelligent robots, COVA has made in - depth progress.

Perhaps this represents the on - the - ground revolution of embodied intelligence, which will start from the city streets.

Find the current on - the - ground scenarios with productivity value

In many people's imaginations, a household vacuum cleaner can clean indoors, and if scaled up for outdoor use, it can clean the roads. It seems that the essence of cleaning is the same, only differing in volume and power.

They do have similarities. All robots facing the physical world essentially need two types of capabilities: one is the ability to move autonomously in the environment (Navigation), and the other is the ability to perform specific tasks (Operation). "In this regard, RoboTaxi, sanitation vehicles, and household vacuum cleaners are all combinations of different capabilities in these two dimensions," Liao Wenlong further explained.

However, when it comes to real - world scenarios, from the perception system to decision - making ability, from task goals to safety boundaries, they are fundamentally different technological species.

Household cleaning is a result of the lazy economy, an additional demand. In contrast, sanitation operations are part of the underlying logic of urban operation, a task that must be completed. Compared with industrial robots transforming assembly lines, the demand for sanitation robots is more urgent and realistic - the working environment is harsh and risky, so there have long been problems with recruitment and high staff turnover. However, the basic sanitation of the city cannot be neglected.

This is also the starting point for COVA when choosing scenarios. They hope that embodied intelligence can truly change productivity. As Liao Wenlong said, "We are developing AI sanitation robots with the goal of future - generalized physical AI, and we are landing on the current AI sanitation robots with productivity value."

The sanitation scenario is the most commercially - viable entry point for the current implementation of embodied intelligence and also the most challenging battlefield. On the one hand, it has the typical ToB characteristics of high - frequency, essential paid services, which is suitable for explaining commercial value through ROI. On the other hand, it naturally includes four difficult hurdles for this generation of Physical AI, and only a few companies can overcome them, forming a natural technological barrier.

The first hurdle is the unstructured environment. There is no strict "drivable area" on curbs, narrow sidewalks, and the edges of green belts. Humans can complete tasks in these areas intuitively, but robots need to truly understand spatial relationships and operation purposes. This forces the system to shift from a modular logic to an end - to - end world model to directly understand operation intentions.

The second hurdle is the dynamic safety decision - making brought about by the game at intersections and with obstacles. Passing through intersections and avoiding pedestrians and non - motor vehicles essentially involves predicting possible future consequences, rather than simply calculating the current position and speed. This places higher requirements on the model's "world understanding" ability.

The third hurdle is the high - precision requirement for edge - following operations due to errors in task execution in narrow spaces. If the robot sweeps too far from the curb, the gap will remain dirty; if it gets too close, it may scratch the curb or facilities. Here, it cannot rely on rough safety margins, nor can it be easily solved by simple distance thresholds. The robot needs to learn intuitive judgment similar to a human "looking in the rear - view mirror."

The fourth hurdle is the control problem caused by the high coupling of movement and operation. Sanitation vehicles often drive, control the rolling brushes and baffles, and chase the blown - away garbage simultaneously. Each action adjustment will affect the overall operation effect and the vehicle's posture.

Due to these real and specific difficulties, there have long been few players in the sanitation robot field. If the robots are made too simple, they cannot truly replace humans; if they are made intelligent enough, they will face extremely high technological thresholds and long - term investment cycles.

For COVA, choosing the sanitation field is not only based on the judgment of industry pain points but also a positive response to its own technological path. If Physical AI is really going to enter the real world, this will be the place where it is first tested and where its value is first demonstrated.

COVA's "Unicorn" robot demonstrates its operation capabilities

Building the intelligent link to enable AI to enter the physical world

Completely different from traditional autonomous driving, which only solves the problem of "getting from A to B," the biggest challenge in the sanitation scenario is how to make the robot work while moving in the city. That is, the robot must understand space, tasks, and the changing world simultaneously.

In the past decade, the industry has generally adopted a "decoupled" technical framework: perception is responsible for identifying obstacles, prediction is responsible for calculating trajectories, and decision - making and control adjust the vehicle body and operation devices according to several engineering rules. However, in the open and dynamic sanitation scenario, such a technical system often becomes fragmented. The more complex the situation, the thicker the rules pile up, and the whole system becomes fragile and difficult to generalize.

The COVA team has always been committed to fundamentally optimizing the technical architecture of sanitation robots. Liao Wenlong introduced that the intelligent capabilities of sanitation robots in the industry have formed a five - level evolution system:

First stage: Can only execute scripts along a fixed route in a closed environment;

Second stage: Can autonomously complete fixed - route driving and operations on public roads relying on high - precision maps;

Third stage: Does not strictly rely on high - precision maps and can adjust routes and strategies in real - time according to the environment;

Fourth stage: Physical intelligent agent: Ready - to - use, can autonomously plan routes and operation scripts, adapt to any urban environment without performance degradation due to scenario changes;

Fifth stage: Cloud - edge integrated multi - physical intelligent agents: Multiple robots can autonomously coordinate to meet urban service needs, achieving or approaching the globally optimal resource allocation.

Currently, compared with similar enterprises in the market, COVA has taken the lead in stably implementing the core capabilities of the fourth stage and is in the critical process of steadily moving towards the fifth stage.

Now, with the continuous breakthroughs in the unified Physical AI Model (world model) technology, robots no longer need to stay at the stage of learning rules set by humans.

The concept of COVA's technical team is straightforward: "Whether in the future or at present, we believe that a unified Physical AI Model should handle all capabilities simultaneously, rather than the decoupled approach that many people are currently trying."

This concept is reflected in COVA's technical route based on the BEV World Model. Through massive data pre - training, it can predict fuzzy future states and directly decode actions. This ability is similar to "intuition": when the wind blows the garbage, it knows where the garbage might drift; when it gets too close to the wall, it understands the consequences of a collision; when passing through an intersection, it assesses the intentions and potential risks of others. This allows the system to break free from the limitations of the previous engineering logic of mapping first, then planning, and then controlling.

At the same time, robots on the street not only need to face the natural physical world but also understand the human - civilized world: the meaning of traffic lights, no - parking zone rules, the boundaries of blind paths... These abstract symbols cannot be fully inferred from pixels. Therefore, on top of the unified physical model, COVA adds a Visual Language Model (VLM) as a bypass cognitive system to parse rules, signs, and intentions and guide actions in the form of strategy prompts. Liao Wenlong gave an image metaphor: "VLM is like the human brain, which thinks deeply when necessary and then guides the motor center."

After the robot has intuition and a "brain," reinforcement learning makes the robot more reliable and stable. By making mistakes in a simulated environment, it can solve not only long - tail scenarios that have never been seen before but also the strategy consistency problem when multiple actions are coupled. Sweeping, edge - following, and obstacle - avoidance are learned together under the unified model, taking both efficiency and safety into account.

Liao Wenlong summarized: "In simple terms, our architecture can be summarized as the World Action Model + VLM (Visual Language Model)."

In actual implementation, to truly achieve "ready - to - use," COVA has added two key capabilities to the model system: one is the Self - Memory mechanism. After the robot enters a new environment, the system automatically writes the road structure and key features it first and subsequently encounters into the world model, achieving "one - time learning, long - term adaptation" and "becoming more and more proficient." The other is the Prompt adjustment. For different traffic rules in different regions (such as left - hand traffic in Singapore and right - hand traffic in China) and operation requirements (such as key protection areas), the behavior strategy can be switched by changing the prompt words without retraining the model. This enables the robot to quickly enter the production state and convert technological capabilities into real - world operational efficiency.

Finally, a continuous, complete, and positive - feedback intelligent link is formed: understanding the world, predicting consequences, deciding on actions, improving through trial - and - error, adapting to urban changes, and following human rules. AI has finally truly entered the physical world.

Behind the successful operation of these intelligent links is COVA's long - term accumulation over the past decade. It is not for a flashy appearance at the peak of the trend but for solving the most difficult yet most valuable problem: how robots can have real productivity.

This also explains why among companies that claim to "make robots," some are still stuck in the PPT stage, some can only demonstrate pre - set actions at exhibitions, while COVA's robots are already working independently in the city.

Ten - year accumulation: data, hardware, and industry understanding

In the field of embodied intelligence, which requires long - term investment and profound engineering capabilities, time brings not pressure but a moat.

It is not accidental that COVA can truly implement the unified physical AI model.

Let's go back to COVA's starting point. From the very beginning, they have been exploring how to make robots have real productivity. The name is the most direct declaration - "COVA Robotics," rather than "COVA Technology" or "COVA Intelligence." "We have always wanted to make AI robots with productivity value," Liao Wenlong told us.

This underlying belief is highly consistent with the team's technical background. He Tao, the founder and CEO of COVA, is the proposer of the feature - driven algorithm theory for autonomous driving. He graduated from the School of Electronic Information at Shanghai Jiao Tong University with a bachelor's degree, and completed his master's and doctoral studies at Tokyo Institute of Technology in Japan. After graduation, he returned to China and taught at Shanghai Jiao Tong University. CTO Liao Wenlong studied control theory as an undergraduate and shifted to AI during his doctoral studies. Most of the team members have technical backgrounds.

Ten years ago, COVA chose to enter the sanitation field. After in - depth research on the industry, the team found that sanitation robots need to interact with complex environments and maintain operations in addition to moving. Therefore, when the Internet boom and capital trends drove the rapid expansion of RoboTaxi, COVA chose a more difficult and slower path: first developing robots in real - world urban service scenarios.

Over the past decade, they have accumulated deeply in three dimensions:

On the one hand, they have control over the hardware foundation. COVA conducts self - research from hardware design to manufacturing and the underlying software. These capabilities are similar to Unitree's accumulation in hardware such as joint motors and motion control.

On the other hand, they have accumulated 50PB of high - quality real - machine data. Compared with passenger cars that mainly operate on main roads, COVA has a large amount of data from scarce scenarios such as sidewalks, parks, and auxiliary roads. Moreover, in the process of continuous data accumulation, COVA has established an efficient data mining and automated annotation project, using self - supervised learning (using future moments to supervise the current moment) and VLM (Visual Language Model) for automated annotation.

A good product not only relies on data and hardware but also requires in - depth understanding of scenarios, which is another advantage COVA has accumulated over the past decade. What kind of operation strategy should be used in what kind of scenario? Should it use water flushing or sweeping? Should it aim for quick completion or a comprehensive operation? When should it even violate traffic rules for operation? What kind of roadside garbage is considered a violation? These standards come from years of experience and have become a barrier that later entrants in the industry find difficult to catch up with.

After ten years of iteration, COVA has formed a rare unified ability in the industry. It can polish an engineering - reliable robot and train a world model that can truly understand the environment and execute tasks, and the two can reinforce each other.

Most importantly, through continuous technological breakthroughs and thinking, COVA has gradually found the idea and solution for the Physical AI Model, which lays the foundation for its subsequent technological leap.

Hardware control, data accumulation, scenario awareness, and technological solutions are fragmented in most embodied intelligence companies, but COVA can make full use of all of them.

On the uncharted path of embodied intelligence, COVA's patience in solving long - term problems and its perseverance in the difficult but correct direction are the reasons for its current success. Therefore, when the wave of the unified world model technology arrives, COVA has truly seized the opportunity for this leap.

A broader market and more imaginative opportunities

The most direct criterion to judge whether a technology is mature is whether the commercial closed - loop can be achieved. Can it replace human labor? Can it be scaled up? Can it adapt to different cities? Can it maintain efficiency in long - term operations? The answers to these questions must be definite.

Relying on its self - research accumulation over the past decade, COVA has found the balance point among cost, capabilities, and scenarios, and has made steady and in - depth progress in commercialization.

Under relatively ideal working conditions, a COVA AI sanitation robot can complete about 20 - 30 kilometers of operations per day, equivalent to the workload of 5 - 10 sanitation workers. Even if calculated based on the lower - limit annual salary of 30,000 yuan per sanitation worker, COVA's robots can still generate positive