Physical AI is trending. Here are some of my new thoughts.
Physical AI is the ultimate mode of AI development. It not only needs to understand human instructions but also all the laws of the physical world.
Recently, a term has been quite popular in the circle, called "Physical AI".
In fact, this term was repeatedly mentioned more than a dozen times by Jensen Huang during his speech at the CES exhibition in Las Vegas at the beginning of last year. However, it was not until this year that "Physical AI" truly exploded.
So, what exactly is "Physical AI"?
The other day, I saw a video of a robot watering flowers. The robot first walked to the faucet, turned on the valve to fill the kettle, then turned around and walked to the flower pot, adjusted the angle, and evenly poured the water in. The spout of the kettle didn't hit the edge of the flower pot, and no water spilled.
To make a machine understand "carrying a glass of water", it has to know that the glass is cylindrical, calculate how much force to use to hold it without slipping or breaking, understand that water is a liquid and will spill when shaken, and adjust the arm angle in real - time during walking to offset the body's fluctuations.
These are things that a three - year - old human child can do intuitively. But for AI, it is a huge leap. In the past decade, AI has learned to see, listen, speak, and draw, but it has always been confined to the screen. What Physical AI needs to do is to put this smart brain into a body that can run, jump, grab, and release in the real world.
To put it simply, Physical AI is to make AI understand and act on the physical world. It is no longer just processing text and images but making correct actions in an environment where gravity, friction, and inertia all play a role.
A fact rarely discussed in China is that the term "Physical AI" did not come from the public relations department of a certain chip giant. This concept first appeared in a paper published in Nature Machine Intelligence in 2020. In the paper, Physical AI was systematically defined for the first time:
A type of entity system capable of performing tasks usually associated with intelligent organisms. The core lies in deeply integrating physical laws into the artificial intelligence system, so that machines are no longer "physically blind" and can complete the closed - loop from perception to action.
From the first shot in the academic circle in 2020 to the full takeover by the industrial circle in 2026, there was a full six - year interval. In these six years, the cost of sensors decreased by several orders of magnitude, the computing power of edge - side AI moved from theory to engineering, and the reliability and mass - production capacity of robot bodies quietly reached a critical point. These are the hidden driving forces for Physical AI to move from papers to production lines.
From Demonstration to Work
If the large - language model in 2023 made AI learn to chat, then the keyword for Physical AI in 2026 is only one: work.
The changes are obvious to the naked eye.
At this time last year, the way for robot companies to show their strength was to shoot demo videos. They set up the scene, rehearsed repeatedly, and shot in one take. It looked good, but you didn't know how many times they had shot it.
This year, the game has completely changed. This year, Zhipu Robotics did something on a 3C production line in Nanchang: they threw the robot into a real factory and let it work continuously for several hours, with the whole process live - streamed. There was no preset script, no limited scene, just the production line that workers face every day. Hundreds of thousands of people watched the live - stream online.
One month later, Zhipu announced the mass production of 10,000 humanoid robots in Hong Kong. From a prototype in the laboratory to 10,000 units on the factory production line, once this hurdle is overcome, the nature changes.
Zhipu's approach is very interesting. Most robot start - up companies focus on a certain link. Those making the body only focus on the body, those making large models only focus on large models, and those making dexterous hands only focus on hands. Zhipu chose another way: it does everything in the whole stack, simultaneously deploying in four directions: body manufacturing, AI models, dexterous operation, and data collection, and also investing in more than 60 companies in the upstream and downstream of the industrial chain.
The cost of doing this is also very obvious. The parent company has more than a thousand employees, and it is expected to further exceed this number by the end of this year. The annual salary alone is one to two billion yuan. This path is money - burning, but once it succeeds, the barrier will be the highest.
Deng Taihua, the founder of Zhipu, proposed an analysis framework called the "XYZ curve". He said that the development of embodied intelligence is divided into three stages: X is the development and trial - use period, where people are still playing with demos; Y is the deployment and growth period, where robots start to really work on the production line; Z is the final intelligent emergence period.
He defined 2026 as: "The first year of the deployment state, officially moving from 'able to move' to 'able to work'". The difference between "able to move" and "able to work" is just one word, but it represents the coming - of - age ceremony of the entire industry.
Overseas is also in a sprint. The pace on the other side of the Pacific is not slow at all.
The American humanoid robot company Figure AI is an unavoidable name in this field. In September last year, they completed a round of financing of over one billion US dollars, and their valuation reached 39 billion US dollars. At that time, it was the humanoid robot company with the highest valuation in the world.
One month later, they released their new - generation product, Figure 03. It is 1.68 meters tall and weighs about 60 kilograms. It demonstrated household chores such as watering flowers, serving dishes, and folding clothes. The founder, Brett Adcock, specifically added on social media that all the actions were completed autonomously by the robot, and there was no one controlling it remotely.
Technically, it is worth noting that Figure made a major route adjustment. They terminated their cooperation with OpenAI and fully switched to their self - developed neural network system, Helix.
This system is designed in a three - layer structure imitating human cognition. The bottom layer is responsible for balance and instinctive reactions, the middle layer translates brain instructions into motor control 200 times per second, and the top layer is the logical brain, responsible for understanding the scene and making decisions. This three - layer architecture of "instinct - reflex - thinking" is quite ingenious, which is equivalent to installing a non - crashing nervous system for the robot.
There is also something worth mentioning. This year, NVIDIA announced at the GTC conference that it had reached in - depth cooperation with the world's four major industrial robot giants: ABB, KUKA, Yaskawa, and Fanuc. More than two million industrial robots already installed on production lines around the world can be virtually debugged and AI - trained through NVIDIA's simulation platform in the future.
These four companies together account for more than half of the global industrial robot market. In the next decade, these robots will face an upgrade from "traditional programming" to "AI - driven". The software platform that can be embedded in this process will be equivalent to getting the "operating system" layer of the next - generation industrial automation. Obviously, NVIDIA doesn't want to miss this ticket.
The Cross - border Sprint of the Supply Chain
There is also an interesting phenomenon: automotive supply - chain enterprises are swarming into the Physical AI field in large numbers.
At the Beijing Auto Show this year, established automotive suppliers such as Aptiv, Valeo, Horizon Robotics, and Qianxun Spatial Intelligence showcased robot - related solutions in a cluster. At that time, many industry insiders realized that the perception of embodied intelligence is the same as that of automotive intelligent driving, and automotive solutions can be directly applied to humanoid robots.
On second thought, it is indeed the case. The automotive intelligent driving system is essentially a closed - loop of perception - decision - execution for a "mobile robot". Its three major modules of visual perception, path planning, and real - time control are highly homologous to those of traditional industrial robots and humanoid robots in terms of technical architecture.
The cameras, radars, wire - controlled chassis, and real - time operating systems in the hands of automotive suppliers can be easily adapted and migrated to the robot field. In this sense, the hundreds of billions of R & D expenses burned by the automotive industry in the past decade on intelligentization are flowing into the Physical AI field in the form of "technology spill - over".
This may explain why Chinese robot companies can quickly enter the mass - production stage. Manufacturing capabilities and supply - chain management don't come out of thin air; many are already available. Those parts suppliers that have been polished on automotive production lines for more than a decade are now entering a new battlefield.
There is a ready - made example abroad. Take Tesla for instance. Its first - generation humanoid robot, Optimus, is also accelerating its entry. Previously, Tesla clearly announced in its Q1 2026 earnings conference call that the company will transform towards a future centered on AI, autonomous taxis, and humanoid robots. The production line of the first - generation robot will have a production capacity of one million units and will replace the existing production lines of Model S and Model X.
The figure of one million units may seem exaggerated in today's context, but Tesla's logic is clear: it wants to directly replicate the large - scale production capabilities and supply - chain management experience accumulated in the automotive manufacturing field to the humanoid robot field.
What Elon Musk wants is not just a "movable robot" but a "mass - production tool" that can work in cooperation with humans in the factory. Once this path succeeds, its impact on the pattern of manufacturing automation will be no less than that of the Model 3 on the fuel - powered vehicle market.
Why Can the World Model Be Used Suddenly This Year?
After talking about the actions of large companies at the industrial level, let's take a deeper look. What is the technological foundation of this Physical AI competition?
If summarized in one sentence, it is the engineering breakthrough of the world model. I think this is also the most crucial point in understanding this wave.
The concept of the "world model" is not new. It was proposed in 2018, and the core idea is very simple: let AI learn an internal understanding of the operating laws of the physical world so that it can predict "what will happen if I push this glass". But in the past, this concept basically only existed in papers - it was too computationally intensive, the generation quality was unstable, and it couldn't do real - time interaction.
The turning point occurred in the past year. NVIDIA launched a series of models called Cosmos, and its core ability is to generate action data that conforms to physical laws from text or images.
For example, if you want to train a robot to learn to move boxes in various weather conditions, you don't need to actually go to the factory to shoot videos in rainy days, snowy days, or at midnight. By setting parameters in the simulation environment, Cosmos can directly generate a large amount of highly realistic training data covering various extreme scenarios.
At the beginning of this year, the Ant Lingbo team open - sourced a framework called LingBot - World, which is specifically for interactive world models. It can achieve continuous and stable video generation for nearly 10 minutes, and the end - to - end interaction delay is controlled within seconds. Users can control virtual characters in real - time with the keyboard and mouse like playing a game, and the model can instantly feedback scene changes. The significance of this is that the world model has changed from "off - line rendering" to "on - line interaction", and the training efficiency has been improved by an order of magnitude.
There is also a start - up company, Jijia Vision, which released the GigaWorld - 1 platform, positioning it as a "digital sandbox" of the physical world. One month later, Alibaba's ABot - PhysWorld surpassed it on a benchmark called WorldArena and ranked first in the comprehensive ranking. The competition is advancing month by month.
The importance of these open - source projects does not lie in how high the parameters are, but in turning a game that only giants could play into a tool that small teams can also use. When there are enough people making wheels, there will be more cars that can really run.
The reason why the world model has become a core element in the era of Physical AI is that it answers the long - standing question: how can robots learn the complex laws of the physical world in a low - cost and high - efficiency way?
The cost of obtaining training data in the real world is extremely high, and it naturally has distribution bias. It is very difficult to gather all marginal scenarios in reality, such as a factory at night during a blizzard, a power outage emergency in a logistics warehouse, or a sudden intervention by a production - line worker. But synthetic data can. By controlling scene parameters with prompts in the simulation environment, researchers can generate large - scale training videos covering extreme conditions within a few hours, which would take months or even years under the traditional real - collection route.
The leverage effect of this breakthrough may exceed any single algorithm improvement.
The Paradigm Has Changed
The breakthrough of the world model is actually just a part of the evolution of the Physical AI technology stack. The changes in the underlying technology are promoting the reconstruction of the entire robot industry's architecture.
Traditional robots use a three - stage approach of "perception, planning, control". First, sensors perceive the environment, engineers write rules to tell the machine how to plan the path, and finally, the machine executes the action. This works well in a structured environment like a factory assembly line, but when the scene becomes complex, its shortcomings are exposed. The machine can only follow the preset script and gets stuck when encountering an unfamiliar situation.
Physical AI takes another path: "perception, reasoning, execution". After perception, instead of following rules written by humans, a trained neural network reasons by itself what to do and then executes. The essential difference is that the former is "engineers thinking for the machine", while the latter is "the machine understanding the physical world by itself".
The international robot standards organization released a technical roadmap this year, predicting that within the next three years, 80% of new models will adopt this new architecture, and the traditional three - stage solution will gradually withdraw from the mainstream. This is not a minor adjustment but a complete change of the paradigm.
As an industry expert said, I think it's quite accurate: Physical AI is the ultimate mode of AI development because it not only needs to understand human instructions but also all the laws of the physical world.
Jensen Huang said that the ChatGPT moment for robot development has arrived. In my opinion, the "ChatGPT moment" of Physical AI is completely different from that of the language model. The "moment" of the language model was when ordinary people around the world first used AI with their own hands. While the "moment" of Physical AI is when AI truly starts to work for the first time.
Now, this field is in a very special stage: the direction is locked, the concept is recognized, but the pattern is not yet set.
On the one hand, demonstration and mass production are two completely different ability systems. If a prototype can work, when it comes to 10,000 products in a real - world scenario, it tests manufacturing consistency, supply - chain resilience, scenario generalization ability, and the operation and maintenance system. These have little to do with AI algorithms, but each aspect can be enough to block a group of players. On the other hand, the cost of data collection in the real world is high, the cycle is long, and the coverage is narrow, which almost dooms that large - scale training of Physical AI will heavily rely on synthetic data.
At the same time, industries that seem to have little to do with "AI", such as the automotive supply chain, traditional industrial automation, and consumer - electronics contract manufacturing, are accelerating their entry into Physical AI through technology spill - over. Their manufacturing capabilities, supply - chain management experience, and scenario resources may be the key variables determining the implementation speed of Physical AI.
An intuitive judgment is that in the AI wave triggered by ChatGPT at the beginning of 2023, the ones who really made the most value were not the model manufacturers but the infrastructure providers. Will the Physical AI wave repeat the same scenario?
NVIDIA's layout implies that it is betting on this direction, but the story is not over yet. 2026 is the first year of the deployment state, and the industrial competition has just begun. Looking back at this moment three years later, which names will still be in the game and which will be out may surprise most people.