Latest Insights from a16z: Five Gaps Embodied Intelligence Must Bridge from Demos to Real

The watershed from feasibility to usability.

In the past two years, a highly repetitive scenario has emerged in the robotics industry.

In carefully edited videos, robotic arms gracefully manipulate unfamiliar objects, humanoid robots navigate complex terrains, and strategic models complete tasks in previously unseen environments. Every product launch event sparks discussions about model architectures, training scales, computing power consumption, and benchmark test results.

However, if we turn off the spotlight and ask a few rather "spoiler" questions:

How many times was this demonstration filmed? If the camera is moved six inches to the left, can the system still succeed? And has it ever really left the laboratory?

These seemingly "spoiler" questions actually mark the watershed for robotics technology to move from "seeming feasible" to "truly usable."

Not long ago, Oliver Hsu, an investor at a16z, specifically wrote an article to systematically sort out the key factors restricting the large - scale implementation of embodied intelligence from the perspectives of engineering, deployment, and operation.

His core judgment doesn't point to "the model being not strong enough," but rather indicates that the real bottleneck lies in the process of translating research results into production systems.

Today, we will start from this article to dissect the real reasons why robotics technology has been slow to spread in the real world.

Starting from a frequently overlooked fact

If the deployment has been delayed, it doesn't mean that research has stagnated. On the contrary, robot learning may be in its most active stage in nearly a decade.

The emergence of Visual - Language - Action (VLA) models represents a paradigm - level change.

It no longer treats robot control as an isolated motion planning problem but instead introduces internet - scale semantic understanding capabilities, bringing "language understanding," "visual perception," and "action generation" into the same modeling framework.

From Google's RT - 2 to Physical Intelligence's π series, and then to GEN - 0 and GR00T N1, this series of work continuously expands the sources of training data, the diversity of robot forms, and the generalization ability of strategies across different tasks and environments.

The transfer from simulation to reality is also constantly improving. Domain randomization and world models are weakening the old problem of "unrealistic simulation."

Cross - platform generalization is starting to become a consensus.

Open X - Embodiment puts millions of trajectories from more than 20 robot platforms into the same training framework, significantly improving the success rate of models on unfamiliar hardware.

Dexterous manipulation is no longer just a demonstration result. Models are beginning to handle deformable objects, tool use, and complex high - contact tasks.

If we only look at the research progress, robot intelligence has almost crossed the threshold of "feasibility."

Five factors restricting the implementation of embodied intelligence

The problem is that these capabilities have hardly entered real - world production systems.

In factories, most industrial robots still execute highly deterministic processes: repetitive welding, fixed grasping, and pre - programmed operations. When product specifications change, the system doesn't "learn" but is reprogrammed.

Warehouse picking is one of the few scenarios that approach the research capabilities. However, even so, deployed systems usually only handle structured goods and operate under controlled lighting and fixed bin layouts. The ability to pick arbitrary items in a cluttered environment in the laboratory is still significantly far from large - scale implementation.

As for humanoid robots, most are still in the pilot and demonstration stages. They are development platforms for researchers, not production tools that enterprises can directly purchase, deploy, and maintain.

An intuitive comparison is:

The main players in the research field are large - model laboratories and cutting - edge startups;

The main players in the deployment field are still industrial robot OEMs and regional system integrators.

These two systems have hardly truly merged.

Intuitively, people often attribute this gap to "the time needed for technology diffusion." But this is only part of the reason.

More crucially, deploying autonomous physical systems is an entirely different problem from research. Autonomous driving has already provided us with enough precedents.

When robots move from the laboratory to the production environment, they will face a whole set of technical and operational challenges simultaneously:

First and foremost is the illusion of success rate caused by distribution changes.

Research systems often evaluate performance in environments that are highly consistent with the distribution of training data. However, the real world never follows the distribution.

A strategy with a 95% success rate in the laboratory may see its success rate plummet to 60% once it enters the warehouse, as factors such as lighting, background, perspective, object material, and mechanical wear will all change.

In other words, benchmark tests cannot cover this complexity. Research focuses on "average performance," while deployment has to deal with "all situations." A large number of long - tail scenarios cannot be covered.

Second, the reliability threshold is another fundamental dividing line between research and production.

In academic papers, a 95% success rate is an excellent result; in production, a 95% success rate means dozens of failures every day.

Every failure means manual intervention, system interruption, and operational costs. Manufacturing systems usually require a stability of over 99.9%, and the failures of learning - based strategies often occur outside the training distribution and are significantly systematic.

Research aims to maximize performance, while production aims to minimize failures. These are two completely different objective functions.

Third, the paradox of computing power and latency.

The performance improvement of VLA models is accompanied by an increase in parameter scale and inference latency. And robot control is extremely sensitive to real - time performance.

Manipulation tasks usually require a control frequency of 20 - 100Hz. Even a 7B - level model can hardly stably meet this requirement on edge hardware, not to mention the network latency introduced by cloud - based inference.

Thus, a dual - system architecture has emerged: separating slow semantic reasoning from fast motion control. But this itself also introduces new system complexity.

Fourth, the underestimated "system integration".

Robots in real - world deployments must be embedded in a whole set of existing systems: WMS, MES, ERP, monitoring, compliance, and maintenance.

If a strategy cannot receive real - world task instructions, cannot cooperate with other devices, and cannot report its status, its value in the production environment is almost zero.

What's more challenging is safety certification. The current standards are designed for predictable and analyzable programmed robots, not for neural network - based strategies. There is no mature answer yet on how to prove the safety of a model with billions of parameters.

Fifth, maintenance is the final real - world threshold.

Research systems are maintained by researchers, while production systems are maintained by technicians.

When a learning - based robot exhibits abnormal behavior, the problem may lie in perception, strategy, control, hardware, or system integration. "Debugging" the weights is not an ability that the existing maintenance system can support.

This is not a single - point problem but a systematic gap.

What's more serious is that the above problems don't exist in isolation and often form a negative feedback loop:

Distribution changes lead to failures, failures increase manual intervention, intervention raises costs, costs limit scale, scale limits data, and insufficient data exacerbates the distribution problem.

That's why the deployment gap cannot be solved by a single research breakthrough.

Bridging the gap: from "models" to "infrastructure"

To solve these problems, simply relying on the upgrade of large models like GPT - 5 is far from enough. What we need is DevOps and infrastructure in the robotics field.

For example, in the early data collection stage, we need to establish remote - operation infrastructure to enable robots to collect data while working. Only when robots start to create value through labor and at the same time consider the cost of data collection can this flywheel start to turn.

Alternatively, we can make AI more reliable. Since errors cannot be avoided, we should make them "controllable." Let robots learn to "fail gracefully" (for example, respond actively when they can't handle a situation instead of crashing directly), and introduce traditional code as a safety net.

For edge deployment, efficient models like Hugging Face's SmolVLA are the future direction. We should aim for "small and beautiful" models or chips specifically designed for robots, rather than stuffing general - purpose GPU loads into robots.

These capabilities determine whether robots can transform from being "smart" to being "reliable."

Different from the software world, the physical world is too complex, and it's difficult for a single product to dominate all scenarios.

Robots are more likely to evolve in the form of an ecosystem: with general capabilities as the foundation, fine - tuning around specific tasks, and gradually expanding the application boundaries.

This characteristic of ecosystem evolution has dragged the robotics industry into the deep end of the Sino - US technological competition.

A common view is that the United States leads in the "brain" (model capabilities) and is committed to creating super - intelligence, while China has a dominant position in the "body" (industrial chain and application scenarios).

The United States has the most advanced VLA models, but China has the largest deployment of industrial robots and the most complex manufacturing scenarios. If the US strategy is to push the upper limit higher, China's strategy is to spread the applications more widely.

In the maintenance race, the one who can solve the "deployment gap" first can transform technological advantages into huge economic value. The one who can build that bridge to make laboratory demonstrations truly enter thousands of households and factories will be the winner of the next era.

This also explains why the deployment gap in robotics is highly correlated with the divergence of Sino - US AI paths. Leading in model capabilities doesn't automatically translate into economic value, while deployment capabilities often determine the final industrial scale.

This is not only a test for this generation of robotics companies but also a competition that hasn't started yet.

This article is from the WeChat official account "Silicon - based Observation Pro." Author: Silicon - based Jun. Republished by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The latest insights from a16z: Five gaps that embodied intelligence must bridge from demos to real-world applications

Starting from a frequently overlooked fact

Five factors restricting the implementation of embodied intelligence

Bridging the gap: from "models" to "infrastructure"