Just because it can “move bricks,” the physics AI has become extremely popular overnight.
Since the beginning of 2026, a buzzword has emerged in the AI circle - "Physical AI".
Jensen Huang mentioned several times at the CES exhibition at the beginning of the year that "the next wave of AI will be AI operating in the physical world." Justin Sun also publicly claimed recently that "the dividends of virtual AI are exhausted, and Physical AI is the biggest opportunity in the next three years."
On the industrial side, the star company Figure AI detonated the whole network with a five - day continuous live - stream of robot sorting. In China, Zhipu Robotics announced the offline of its 10,000th general embodied robot...
The speeches of industry leaders and the real changes in embodied intelligence have drawn the industry's attention to this grand narrative of transitioning from virtual intelligence to physical execution. However, many people still have questions. Is this so - called "Physical AI" an inevitable turning point in technological development or a well - packaged concept change?
01 From "able to chat" to "able to do things"
Before answering the above questions, let's first break down this somewhat rigid professional term.
Physically speaking, Physical AI is an artificial intelligence technology that deeply integrates AI with the physical world. However, looking at its core, virtual AI is responsible for "thinking and communication", while Physical AI must "perceive and act". In this way, it is no longer just an intelligent agent on the screen, but enables machines to perceive, understand, and execute complex operations in the real physical world.
In other words, Physical AI is a technology that "enables autonomous machines (such as robots, self - driving cars, etc.) to perceive, understand, and execute complex operations in the real physical world." Wang Xiang, an executive member of the China Computer Federation, systematically elaborated on this concept at the Third China International Supply Chain Expo. "Physical AI means that the AI system has the closed - loop ability of 'perception - reasoning - action - feedback' in the real world."
To put it simply, the previous AI was "able to chat", while the current Physical AI is "able to do things". When AI steps out of the ChatGPT dialog box and enters real - world factories, warehouses, and homes, this is the problem that Physical AI needs to solve.
This difference is particularly evident in the recent developments of two star robot companies this year.
One is Figure AI in the United States, which used a five - day continuous live - stream to prove that "robots can really work". The live - stream started on May 14th. The content of the live - stream was that three Figure 03 humanoid robots took turns to sort express packages on the production line. The robots' tasks were to detect barcodes, grab packages, readjust the direction, and place the packages with the barcodes facing down on the conveyor belt.
During the live - stream, one robot worked continuously for more than 33 hours and processed more than 40,000 packages. The founder, Brett Adcock, said that the robots used the company's latest Helix 02 model and operated in "fully autonomous mode".
The significance of Figure AI's live - stream lies not only in demonstrating its technological capabilities but also in using real - time images to tell the world that Physical AI technology has crossed the critical point of "laboratory demonstration". A company's live - stream of robots working continuously on the production line for several days without major problems is itself a powerful technological declaration.
Zhipu Robotics in China also held a similar live - stream. It put its Zhipu Elf G2 on the tablet production line MMIT (Multimedia Integration) in Nanchang Longqi Technology Industrial Park to work with humans. The live - stream's measured data showed that the robot had zero major abnormalities during 8 - hour continuous operation, and the overall operation success rate was as high as over 99.5%. It only took 18 - 20 seconds for a single process, and it could complete 310 products per hour. One robot could undertake the workload of two processes.
Going a step further than Figure AI, Zhipu Robotics officially announced in March that the world's first general embodied intelligent robot achieved the offline and delivery of 10,000 units. From December 2025 to March 2026, it only took a little over three months to achieve the leap from 5,000 to 10,000 units.
In addition to the delivery volume, Zhipu Robotics revealed that the company plans to achieve a revenue of 10 billion yuan in 2027. Judging from the development experience of cutting - edge industries such as new energy, self - driving, or chips in the past, a company that has been established for less than two years, can achieve mass production and delivery of tens of thousands of units, and set a revenue target of tens of billions. This can be regarded as a phenomenon in the hard - tech field.
The above two companies have used solid data and scenarios to prove that Physical AI no longer needs to rely on remote control or pre - set scripts to "perform", but has the ability to independently complete complex tasks in a real environment.
More importantly, Zhipu Robotics was the first to cross the threshold of 10,000 - unit delivery, binding mass - production capabilities with on - hand orders, indicating that a turning point from "technology verification" to "commercial realization" has occurred in this track. In other words, the "feasibility" of Physical AI is no longer in question, and the real competition has entered the deep water area of "usability" and "economy".
02 The technological driving force behind the explosion of Physical AI
So, the question now is, why did Physical AI suddenly explode this year? Looking back now, in addition to the real commercial demand, a series of technological breakthroughs behind it have become the biggest driving force.
First of all, the large language model (LLM) has brought "understanding ability" to robots. Traditional robots rely on deterministic code and rule - based programming, which is equivalent to engineers writing a "script" in advance. Every action of the robot is strictly executed according to the preset requirements of the "script". There is a major flaw in this model, that is, if the working environment of the robot changes slightly, the code needs to be rewritten. It has poor robustness and is difficult to cross the commercialization threshold.
However, after Google tried to combine LLM with the physical execution of robots and successively launched embodied multimodal large models such as Google PaLM - E and RT - 2 in August 2023, it enabled robots to automatically decompose complex tasks into several steps and execute them through natural language instructions. The large language model has thus completed the leap from "dialogue understanding" to "physical execution".
Jensen Huang pointed out the essence of this technological evolution in his speech at CES 2026: Physical AI is actually a handover of underlying control rights. When Physical AI crosses the critical point of technological evolution, the control rights are transferred from the deterministic code written by humans to the neural network with generalization ability and an understanding of physical laws.
At this time, robots no longer just "execute code" but have the ability to "understand instructions and plan actions by themselves".
If the large language model solves the problem of "understanding", then the world model solves the problem of "acting in the physical world". The core of the world model is to enable AI to learn an internal understanding of the operating laws of the physical world.
NVIDIA's release of the Physical AI world basic model platform Cosmos at last year's CES was a landmark event. The core ability of this model is to generate action data that conforms to physical laws from text or images. Developers can use Cosmos to accelerate the development of Physical AI for intelligent cars, robots, and video - analysis AI agents.
According to NVIDIA, Cosmos is trained based on more than 20 million hours of real data, which greatly reduces the difficulty of simulation and model training. With the world model, the AI system can conduct massive simulation exercises in a virtual environment and then transfer them to the real physical world.
The ultimate ability of a robot is not just "seeing" or "understanding", but "doing correctly". The emergence of the Vision - Language - Action model enables robots to simultaneously process visual input, language understanding, and action control, thus achieving a closed - loop of "seeing is doing".
DeepMind released a new - generation multimodal embodied intelligent large model, Gemini Robotics 1.5, in September last year, claiming that it is the world's first thinking - type model optimized for embodied reasoning. NVIDIA launched an open - source model, Isaac GR00T N1.6, specifically designed for humanoid robots, which can unlock full - body control.
At the same time, the Beijing Humanoid Robot Innovation Center open - sourced the embodied cerebellum large model XR - 1, which has become the first model in China that meets the national standard for embodied intelligence. It is trained based on more than one million data and can complete complex two - arm operation tasks such as picking, placing, pushing, pulling, and rotating.
So far, Physical AI has "gathered" the basic supporting technological capabilities necessary for implementation. LLM enables machines to "understand" human intentions, the world model enables machines to "predict" physical consequences, and VLA bridges the last mile from "seeing" to "doing correctly". The combination of the three enables robots to have the basic ability to independently execute tasks in an open environment for the first time.
Of course, there are still bottlenecks in dexterous operation, and there are still many problems to be solved in the fine control of two arms and hands. In other words, Physical AI has obtained the entry ticket to "work in the factory", but to truly "enter the home to serve tea and water", it still needs to cross the qualitative - change threshold from "rough actions" to "refined operations".
03 From technological vision to delivery ability
It is important to understand the past and present of Physical AI. Now, the problem that the embodied intelligence industry needs to face is, around which core dimensions will the next competition unfold?
We can summarize experience from the development of self - driving. The data war is inevitable for self - driving, and embodied intelligence, which has a similar logic to self - driving, also cannot avoid it. Generally speaking, whoever has higher - quality training data has the right to speak.
Now in the industry, NVIDIA was the first to establish a barrier for the world model with Cosmos. Its training model based on more than 20 million hours of real data is difficult to be quickly replicated. Zhipu Robotics has completed the mass production and deployment of 10,000 robots, which means it has the ability to collect real, feedback - driven data, which is also widely regarded as a data moat in the industry.
It should be noted that the data required for the Physical AI competition is not simply about who has a larger amount, but requires the collaboration of synthetic data and real data.
Relying solely on real data will face scale problems and hardware - wear - and - tear costs. Over - relying on synthetic data will result in a migration gap from simulation to reality (sim2real). The "cross - data - source learning" solution of the Beijing Humanoid Robot Innovation Center is the product of this idea, which enables robots to use a large amount of human videos for training, greatly reducing training costs and improving training efficiency.
It is easy to understand that in the future, whoever can truly connect the complete closed - loop of "synthetic data training - real data fine - tuning - actual scenario feedback" will occupy the high - ground in this competition.
After solving the data problem, how to efficiently integrate Physical AI with virtual AI has become the key to the further development of Physical AI.
When we talk about Physical AI now, an often - overlooked aspect is that Physical AI and virtual AI are not opposed. From a technical architecture perspective, a complete Physical AI system can be roughly divided into three layers: the bottom layer is the perception layer (sensors, visual recognition), the middle layer is the cognitive decision - making layer (AI reasoning), and the upper layer is the action execution layer (mechanical control).
Virtual AI is mainly responsible for the middle layer, while Physical AI needs to connect the complete chain from perception to execution.
NVIDIA's full - stack solution of "chip + model + tools" is an embodiment of this idea. The Jetson Thor edge - computing platform provides computing power, the GR00T model provides intelligence, and the Isaac platform provides a development toolchain. With reference to this solution, in the future, whoever can achieve a deep integration of software and hardware will not only be able to complete the closed - loop of Physical AI from the "brain" to the "limbs" but also establish their own technological moat.
The last point is the commercialization process of Physical AI. Three years ago, the imagination space of the capital market for the robot track came from the "technological vision", but now, the capital market has a more practical evaluation standard, that is, delivery ability.
According to media statistics, the total financing in the field of embodied intelligence in China in 2025 was 73.5 billion yuan, with 744 investment and financing events. Since 2026, more than 37 billion yuan has been added, with the cumulative amount exceeding 110 billion yuan. However, under this prosperous situation, there has been a visible structural shift in the flow of capital.
In May 2026, Tianji Intelligence completed a Series B financing of 1 billion yuan. Its core advantage is that its on - hand orders in Q1 exceeded 10,000 units, and its customers cover 45 robot enterprises.
Zhongke Fifth Epoch obtained hundreds of millions of yuan in Series A financing during the same period and also disclosed that it had won overseas orders worth hundreds of millions of yuan.
In the financing of Vita Power and Luming Robotics, industrial investors such as SAIC Shangqi Capital and Mitsubishi Electric have successively entered the market, aiming to bind production - line capacity with robot delivery ability.
In contrast, the American humanoid robot startup Cartwheel Robotics, although having a technological vision but no order support, declared bankruptcy in March 2026.
The positive and negative cases show that capital no longer pays for cool demos but only for real mass - production and delivery ability.
04 Conclusion
The popularity of Physical AI seems sudden, but in fact, it is a natural result.
Of course, some industry insiders believe that "Physical AI" is more of a new concept packaged by the capital market, and its essence is still the natural evolution of embodied intelligence and robot technology. However, it is undeniable that the rise of Physical AI clearly marks that the AI industry is moving from "virtual intelligence" to "physical execution", which is an irreversible historical process.
In the latest round of competition, Figure AI showed its strength to the world through a live - stream, Zhipu Robotics established an industrial barrier through mass production and delivery, and NVIDIA built a platform ecosystem with Cosmos and GR00T... The next question is, which company will become the OpenAI in the field of Physical AI? Which application scenario will be the first to welcome the "ChatGPT moment"?
This article is from the WeChat official account “Insight New Research Club” (ID: DJXYS -