"Physical AI": A Crucial Turning Point as Artificial Intelligence Reaches the Middle Stage

It concerns the future and also concerns you and me.

No one knows what the endgame of artificial intelligence will be, but tech giants are placing huge bets.

In the autumn of 2025, "Physical AI" stepped into the spotlight of the artificial intelligence stage and became the focus of global tech companies competing with each other.

As NVIDIA stands at the threshold of a $5 trillion market value, Physical AI has become the key to the door. At this year's GTC conference, Jensen Huang systematically elaborated on the technological strategy of Physical AI and announced major layouts in cutting - edge fields such as quantum computing and 6G networks.

At the 2025 XPeng Tech Day with the theme of "Emergence", XPeng Motors outlined a clear picture of Physical AI in future mobility. It also released four important applications around Physical AI, including the second - generation VLA, Robotaxi, the new generation of humanoid robot IRON, and XPeng HT Flying System.

A global Physical AI competition has fully kicked off. From Silicon Valley to China, tech giants are investing hundreds of billions to compete for the right to speak in the next technological era.

Three Key Steps for the Implementation of Physical AI

In 2020, Aslan Miriyev from the Swiss Federal Laboratories for Materials Science and Technology and Mirko Kovac from Imperial College London first proposed the concept of "Physical AI" in Nature Machine Intelligence, emphasizing the collaborative evolution of elements such as the body, control, and perception.

In 2024, Jensen Huang, the CEO of NVIDIA, regarded it as the core direction of AI development and proposed to achieve physical interaction capabilities through the chain of perception, reasoning, and action.

Physical AI pushes artificial intelligence from "digital understanding" to the dimension of "physical interaction", and has become a new yardstick to measure the core competitiveness of tech companies. Its implementation depends on three key steps: physical modeling and training in virtual environments, generation and reasoning of high - quality physical data, and the closed - loop of perception and decision - making in real scenarios.

Virtual modeling is the foundation of Physical AI. Its core is to build a simulation environment highly consistent with the real world by integrating classical physical laws and deep learning, which is mainly achieved through generative physical engines and reinforcement learning technologies. It combines neural networks to simulate physical laws and generate training data.

The generative physical engine integrates classical physical laws (mechanics, thermodynamics, etc.) and deep learning to build a simulation system with multi - physical field coupling, supporting dynamic simulation of multiple scenarios such as rigid bodies, fluids, and electromagnetics. In this process, it is necessary to balance simulation accuracy and real - time performance, and at the same time, it should be scalable to adapt to physical scenarios of different complexities (from simple motions to complex material interactions). There is a natural contradiction between high - precision modeling and real - time computing, and the gap needs to be narrowed through algorithm optimization (such as hierarchical integration and dynamic damping adjustment).

The performance of Physical AI depends on the support of high - quality data. The virtual - real fusion model of "synthetic data + real data" solves the pain points of scarce real physical data and difficult annotation. The generation and reasoning of high - quality data mainly rely on the combination of physical modeling, data acquisition technology, and generative models, which are achieved through real data acquisition, physical constraint optimization, and algorithm generation.

In this step, synthetic data is generated through the physical engine, and the diversity of data is expanded by combining generative AI. In the reasoning stage, physical constraints need to be embedded to achieve prediction and attribution of object motion and interaction relationships. Among them, the data needs to meet the requirements of "physical authenticity" (in line with objective laws) and "comprehensive distribution" (covering extreme scenarios and boundary conditions), and the reasoning process needs to be interpretable, rather than pure black - box prediction. The challenge lies in the domain gap between synthetic data and real data, which needs to be narrowed through data enhancement and virtual - real fusion technologies. At the same time, the efficient reasoning of physical data puts forward higher requirements for computing power and algorithm architecture.

The ultimate value of Physical AI lies in its implementation in real scenarios. The closed - loop of perception and decision - making in real scenarios mainly relies on multi - modal data fusion, end - to - end model architecture, and real - time computing power support, achieving a closed - loop through environmental perception, intention understanding, rapid decision - making, and precise execution.

This step connects the model trained in the virtual environment with the real physical world, completing the closed - loop iteration of "perception - decision - execution - feedback" and enabling AI to adapt to the uncertainty of the real environment. Multi - sensor fusion (vision, force control, inertial measurement, etc.) enables accurate perception of the environment and object states. Decision - making algorithms need to combine model predictive control and reinforcement learning, taking into account both real - time performance and robustness. The complexity of the real environment (unstructured and dynamically changing) far exceeds that of the virtual scenario, and the problem of insufficient model generalization ability needs to be solved. At the same time, edge - side deployment needs to further balance reasoning speed, accuracy, and hardware power consumption.

Strategic Positioning of Overseas Tech Giants

As Physical AI has become the next key battleground in the field of artificial intelligence, global tech giants have formed distinct development paths based on their respective advantages.

At the Barcelona Smart City Expo World Congress, NVIDIA comprehensively demonstrated the application results of Physical AI. By integrating platforms such as Omniverse, Cosmos, and Metropolis, it can not only simulate the real - world environment but also generate synthetic data, train visual language models (VLM), and analyze urban video streams through AI agents, forming a complete ecological closed - loop from data to decision - making.

To address the pain points of high training costs and high risks in real - world scenarios (such as high - risk operation tests of robots and aerodynamic experiments of aircraft), Omniverse provides solutions through "high - precision physical modeling + digital twin". It can not only simulate the effects of multiple physical fields such as gravity, friction, and fluid mechanics but also support the virtualization of robot hardware design and algorithm verification, shortening the prototype iteration cycle.

The second pain point faced by Physical AI is the crisis of data depletion, which requires high - quality data with physical attributes. Cosmos breaks through the bottleneck with its dual capabilities of "generative modeling + physical reasoning". It can generate physically realistic video data through text and image input, solving the defect that traditional VLM cannot handle multi - step physical tasks. It can predict physical changes based on prior knowledge and independently reason about the next step or action.

In dynamic real - world scenarios, Physical AI requires low - latency perception and real - time decision - making (such as obstacle avoidance in autonomous driving and traffic scheduling in smart cities). Metropolis builds a perception foundation through "edge visual analysis + computing power collaboration". It can not only capture multi - modal physical dynamics through perception devices but also accelerate real - time reasoning at the edge, meeting the millisecond - level action generation requirements of Physical AI.

Google DeepMind has taken the route of general intelligence. In September this year, DeepMind officially released the new - generation general robot base model - the Gemini Robotics 1.5 series. This series consists of two major models: Gemini Robotics 1.5 (GR 1.5), a multi - modal large model responsible for action execution; and Gemini Robotics - ER 1.5 (GR - ER 1.5), which enhances reasoning ability and provides planning and understanding support. Here, ER stands for "embodied reasoning". This series of models not only understands language and images but also combines vision, language, and action (VLA) and realizes "think first, then act" through embodied reasoning.

The combination of the two enables robots not only to perform single actions such as folding paper and opening bags but also to solve multi - step tasks that require understanding external information and decomposing complex processes, such as sorting light and dark clothes and automatically packing luggage according to the weather in a certain place. It can even search the Internet by itself according to specific requirements in a specific location (such as different garbage classification standards in Beijing and Shanghai) to help people complete garbage classification. This model can also achieve zero - sample cross - platform migration of capabilities among multiple different robots.

Tesla adheres to the product - driven strategy. The second - generation Optimus robot is equipped with a self - developed physical engine. Its dexterous hand with 22 degrees of freedom can complete tasks such as folding shirts and sorting items in the factory. It can also convert the driving data of millions of Tesla cars into training materials for Physical AI, forming a unique closed - loop where the mobility scenario feeds back into robot R & D. Elon Musk has high hopes for the Optimus project and has set an extremely aggressive goal of producing up to 5,000 Optimus robots by the end of this year.

In June this year, Amazon announced that it would form a new Agentic AI team within its mysterious hardware R & D department, Lab126, to start the R & D work on Physical AI. This decision marks Amazon's official entry into the R & D field of Physical AI, especially in the in - depth exploration of robot technology.

The positions in Amazon's warehouses may be the first areas affected by Physical AI. Recently, Amazon released a new multi - functional warehouse robot system called "Blue Jay" and revealed that it has been tested in a warehouse in South Carolina, USA. Blue Jay integrates multiple links such as picking, sorting, and consolidating packages, aiming to combine three previously independent robot workstations into one.

Amazon plans to achieve 75% automation in warehousing and logistics by 2027, which may reduce the recruitment of more than 500,000 positions and save $12.6 billion in labor costs.

In addition to Blue Jay, Amazon has also launched two other technological innovations. One is an agent - based AI system called "Project Eluna", which aims to provide decision - making support for operations managers. This system can integrate historical and real - time data, predict operational bottlenecks, and recommend solutions to operators. The other innovation is an augmented reality (AR) glasses designed for delivery drivers. This glasses integrates artificial intelligence, sensors, and cameras, which can overlay information such as route navigation and danger warnings (such as indicating that there is a dog at the customer's residence) in the driver's field of vision and can scan packages.

Physical AI Reinvents Productivity

Behind this global competition lies the huge potential of Physical AI to reshape the productivity landscape.

Gartner made a significant prediction that by 2030, all the work of IT departments will be deeply bound to AI, and AI will completely reshape the traditional work mode and the pattern of talent demand. Among them, within the next five years, 25% of IT work will be completely independently performed by robots, and the remaining 75% of the work will need to be completed collaboratively by human practitioners with the help of AI tools.

The ultimate value of Physical AI lies in liberating humans from repetitive physical labor. When Robotaxi automatically completes urban commuting, robots undertake high - risk operations, and flying cars open up low - altitude channels, humans can focus on more valuable activities such as creativity and R & D. This liberation of productivity will bring about a huge leap in productivity. When each machine can understand the physical world, humans will gain unprecedented freedom.

In the industrial field, the core of Physical AI is to upgrade traditional "rigid automation" to "flexible autonomy", achieving efficiency leap and cost optimization in the entire production process. Its transformation logic revolves around three pillars: "digital twin training ground + autonomous decision - making robot + full - link collaborative optimization".

Digital twin technology enables factories to get rid of the inefficient model of physical trial - and - error. Every detail of industrial design and manufacturing can be simulated and optimized in the virtual space, greatly shortening the production cycle and reducing the failure rate in the early stage of product launch. More importantly, the in - depth integration of physical simulation and AI solves the pain point that traditional industrial robots cannot think. Through the simulation environment, robots can complete millions of scenario trainings in the virtual space - from warehousing sorting to equipment maintenance, from parts assembly to fault troubleshooting, forming the optimal operation strategy without occupying real production capacity.

Full - link collaborative optimization enables productivity improvement to move from single - point breakthrough to systematic upgrade. Through the preset algorithm modules built into the decision - making optimization platform, the response time for production plan adjustment can be shortened from hours to ten minutes, further reducing the comprehensive production cost.

In the two fields of transportation and energy, which are crucial to the national economy and people's livelihood, Physical AI is reshaping the industry's productivity landscape by precisely controlling complex physical systems, solving the two pain points of low efficiency and safety risks at the same time.

In the field of autonomous driving, Physical AI is the key to moving from "laboratory demonstration" to "commercial implementation", solving the complexity and uncertainty of real roads. Relying on the architecture of "multi - sensor fusion + physical world model + super - strong computing power", the autonomous driving system can accurately perceive road conditions, vehicle positions, and pedestrian dynamics, cracking the problem of the accuracy of perception and decision - making in traditional autonomous driving under extreme weather and unexpected situations.

Physical AI upgrades transportation productivity from single - vehicle carrying to cluster - based intelligent scheduling. Through multi - modal large models of the physical world such as MogoMind, autonomous driving fleets can achieve dynamic route planning and real - time capacity allocation, further improving urban travel efficiency and reducing logistics and transportation costs, completely changing the inefficient mode of "fighting separately" in the traditional transportation system.

In the energy field, Physical AI is promoting the transformation of clean energy from "intermittent supply" to "stable output", optimizing energy utilization efficiency. Physical AI can dynamically adjust the power distribution strategy according to real - time electricity load, new energy generation power, and transmission line loss data, reducing grid losses and improving the new energy consumption rate.

Although Physical AI has made significant progress, to fully reshape productivity, it still needs to overcome three core challenges.

First, Physical AI is deeply coupled with physical systems, and decision - making errors may lead to serious consequences such as production accidents and medical risks. However, there is no unified global safety standard for Physical AI yet.

Second, the differences between the simulation environment and the real world (such as materials, lighting, and interference) still affect the generalization ability of AI models. Sim2Real migration remains a technical difficulty.

Third, the R & D costs of high - end sensors, GPU computing power, and customized algorithms are extremely high, which are difficult for small and medium - sized enterprises to afford, resulting in limited technology popularization speed.

The competition among global tech giants for Physical AI is essentially a fight for technological dominance in the next decade. Physical AI is not only the upgrade direction of artificial intelligence but also the core indicator to measure a country's technological competitiveness. It determines the process of high - end manufacturing and affects the pattern of trillion - level markets such as future mobility and robots.

By 2030, Physical AI will fully penetrate into every corner of production and life: factories will achieve 100% autonomous production, agricultural robots will complete the entire process of operations from sowing to harvesting, autonomous driving fleets will dominate urban travel, fusion power plants will provide stable clean energy, and AI doctors will achieve accurate

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

As artificial intelligence reaches the middle stage, "Physical AI" has become a crucial turning point.

Three Key Steps for the Implementation of Physical AI

Strategic Positioning of Overseas Tech Giants

Physical AI Reinvents Productivity