Künstliche Intelligenz hat die Halbzeit erreicht, und "Physik-AI" wird zum entscheidenden Punkt.
Nobody knows what the end of artificial intelligence will look like, but the technology giants are betting big on it.
In the fall of 2025, "Physical AI" takes center stage on the AI stage and becomes the hotly contested focus of global technology companies.
As NVIDIA stands on the threshold of a market value of $5 trillion, Physical AI will be the key. At this year's GTC conference, Jensen Huang systematically presented the technology strategy of Physical AI and at the same time announced important plans in leading fields such as quantum computing and 6G networks.
At the XPENG Tech Day 2025 with the theme of "Emergence", XPENG outlined a clear picture of Physical AI in future mobility and released four important applications around Physical AI: the second generation of VLA, Robotaxi, the new generation of humanoid robots IRON, and the flight system of XPENG HT.
A global race for Physical AI has fully begun. From Silicon Valley to China, technology giants are investing billions to gain the decision - making power in the next technology era.
Three Key Links for the Implementation of Physical AI
In 2020, Aslan Miriyev from the Swiss Federal Laboratories for Materials Science and Technology and Mirko Kovac from Imperial College London first proposed the concept of "Physical AI" in the journal "Nature Machine Intelligence", pointing out the cooperative development of factors such as body, control, and perception.
In 2024, Jensen Huang, the CEO of NVIDIA, regarded it as the core direction of AI development and proposed to achieve the ability of physical interaction through the chain of perception, reasoning, and action.
Physical AI brings artificial intelligence from "digital understanding" to the dimension of "physical interaction". It has become the new standard for the core competitiveness of technology companies. Its implementation depends on three key links: physical modeling and training in virtual environments, the generation and reasoning of high - quality physical data, and perception and decision - making closed - loop in real scenarios.
Virtual modeling is the foundation of Physical AI. Its core is to create a simulation environment that highly matches the real world through the fusion of classical physical laws and deep learning. This is mainly achieved through generative physical engines and reinforcement learning technologies, where neural networks simulate physical laws and generate training data.
The generative physical engine fuses classical physical laws (mechanics, thermodynamics, etc.) and deep learning to create a simulation system with coupled multi - physics fields, which supports dynamic simulation in various scenarios such as rigid bodies, fluids, and electromagnetism. One must balance simulation accuracy and real - time performance while having scalability to handle different physical scenarios of varying complexity (from simple movements to complex material interactions). There is a natural conflict between high - precision modeling and real - time calculation, which must be reduced through algorithm optimizations (such as layer integration, dynamic damping adjustment).
The performance of Physical AI depends on the support of high - quality data. The hybrid model of "synthetic data + real data" solves the problems of the shortage of real physical data and the difficulty of their annotation. The generation and reasoning of high - quality data mainly depend on the combination of physical modeling, data acquisition technologies, and generative models and are achieved through methods such as real data acquisition, optimization of physical constraints, and algorithm generation.
In this link, synthetic data is generated by the physical engine, and the diversity of data is expanded by generative AI. In the reasoning phase, physical constraints must be embedded to enable the prediction and attribution of object movements and interaction relationships. The data must meet the requirements of "physical reality" (in line with objective laws) and "comprehensive distribution" (covering extreme cases and boundary conditions), and the reasoning process must be interpretable rather than a pure black - box prediction. The challenge is that there is a domain difference between synthetic and real data, which must be reduced through data enhancement technologies and hybrid models. At the same time, efficient reasoning of physical data places higher requirements on computing power and algorithm architecture.
The ultimate value of Physical AI lies in its implementation in real scenarios. Perception and decision - making closed - loop in real scenarios mainly depend on the fusion of multimodal data, end - to - end model architecture, and real - time computing power and are achieved through environmental perception, intention understanding, rapid decision - making, and precise execution.
In this link, the model trained in the virtual environment is connected to the real physical world to achieve an iterative closed - loop of "perception - decision - execution - feedback" and adapt AI to the uncertainty of the real environment. The fusion of multiple sensors (visual, force control, inertial measurement, etc.) enables precise perception of the environmental state and objects. Decision - making algorithms must combine model predictive control and reinforcement learning to ensure both real - time performance and robustness. The complexity of the real environment (unstructured, dynamically changing) far exceeds that of the virtual scenario. It is necessary to solve the problem of the model's lack of generalization ability. At the same time, when implementing on the edge side, one must further balance reasoning speed, accuracy, and hardware energy consumption.
The Strategic Positioning of Foreign Technology Giants
Since Physical AI has become the next key battleground in the field of artificial intelligence, global technology giants have embarked on characteristic development paths based on their respective strengths.
At the World Congress of the Barcelona Smart City Expo, NVIDIA concentratedly presented the application results of Physical AI. Through the integration of platforms such as Omniverse, Cosmos, and Metropolis, it can not only simulate the real - world environment but also generate synthetic data, train visual language models (VLM), and analyze the city's video streams through AI agents to form a complete ecological closed - loop from data to decision - making.
Facing the problems of high costs and high risks of training in real scenarios (such as robot tests in dangerous working environments, aircraft aerodynamics experiments), Omniverse offers a solution through "high - precision physical modeling + digital twins". It can not only simulate the effects of multi - physics fields such as gravity, friction, and fluid mechanics but also support the virtualization of robot hardware design and algorithm validation to shorten the prototype iteration cycle.
The second problem that Physical AI faces is the data poverty crisis. High - quality data with physical attributes are needed. Cosmos overcomes the bottlenecks through the dual ability of "generative modeling + physical reasoning". It generates physically realistic video data through the input of texts and images, solves the problem that traditional VLM cannot handle multi - step physical tasks, can predict physical changes based on a priori knowledge, and autonomously derive the next step or action.
In dynamic real scenarios, Physical AI requires low - latency perception and real - time decision - making (such as collision avoidance in autonomous driving, traffic control in smart cities). Metropolis builds a perception foundation through "edge visual analysis + computing power coordination". It can not only capture the dynamic physical states in different modes through perception devices but also accelerate real - time reasoning on the edge side to meet the requirements of Physical AI for millisecond - accurate action generation.
Google DeepMind has taken a path of general intelligence. In September this year, DeepMind officially released the new generation of the general robot base model - the Gemini Robotics 1.5 series. This series consists of two models: Gemini Robotics 1.5 (GR 1.5), a multimodal large - scale model for action execution; and Gemini Robotics - ER 1.5 (GR - ER 1.5), which strengthens the reasoning ability and provides planning and understanding support. Here, ER stands for "embodied reasoning". This series model is not limited to the understanding of language and images but also combines vision, language, and action (VLA) and realizes "think first, then act" through embodied reasoning.
The combination of the two enables robots not only to perform simple actions such as folding paper or opening bags but also to solve multi - step tasks that require the understanding of external information and the division of complex processes, such as sorting light and dark clothes or automatically packing suitcases according to the weather at a certain place. It can even search the Internet according to the specific requirements of a certain place (such as the different waste separation standards in Beijing and Shanghai) to help people with waste separation. This model can also transfer capabilities across platforms between different robots without sample examples.
Tesla has pursued a product - driven strategy. The Optimus Generation 2 robot is equipped with a self - developed physical engine. Its hand with 22 degrees of freedom can perform tasks such as folding shirts and sorting objects in the factory. It can also convert the driving data of millions of Tesla cars into training materials for Physical AI to form a unique closed - loop where vehicle scenarios support robot R & D. Elon Musk has high hopes for the Optimus project and has set an extremely aggressive goal. He plans to produce up to 5,000 Optimus robots by the end of this year.
In June this year, Amazon announced that it would establish a new team for Agentic AI in its mysterious hardware research unit Lab126 and start the R & D of Physical AI. This decision marks Amazon's official entry into the field of Physical AI R & D, especially in the in - depth exploration of robotics technology.
The jobs in Amazon warehouses could be among the first areas affected by Physical AI. Recently, Amazon introduced a new versatile warehouse robot system called "Blue Jay" and announced that it is already being tested in a warehouse in South Carolina, USA. Blue Jay integrates multiple links such as package selection, sorting, and summarizing to combine three independent robot workstations into one.
Amazon plans to automate 75% of its warehouse logistics by 2027, which could potentially avoid laying off more than 500,000 workers and save $12.6 billion in labor costs.
In addition to Blue Jay, Amazon also presented two other technological innovations. One is an agentic AI system called "Project Eluna", which is supposed to support the decision - making of operations managers. This system can integrate historical and real - time data, predict operational bottlenecks, and recommend solutions to operators. The other innovation is AR glasses for delivery drivers. These glasses integrate artificial intelligence, sensors, and cameras, can overlay information such as route navigation and danger warnings (such as indicating dogs at the customer's household) in the driver's field of vision, and scan packages.
Physical AI Re - Shapes Productivity
Behind this global race lies the enormous potential of Physical AI to re - shape the productivity structure.
Gartner has made a sensational prediction. By 2030, every job in the IT department will be closely linked to AI. AI will completely change the traditional work pattern and the need for qualified employees. Within the next five years, 25% of IT jobs will be completely independently performed by robots, while the remaining 75% of the work will be done by human employees with the help of...