StartseiteArtikel

2026: AI starts to "take root physically"

思策智库2026-01-27 13:30
In 2026, AI has transitioned from the "cloud brain" to the "edge terminal" and been integrated into industrial reality. Physical AI serves as the engine for the new industrial revolution while also facing more arduous exploration.

In an era where the technology iteration cycle is constantly shrinking, we always seem to be looking for a "turning year."

As we enter 2026, the visible change is that the mainstream applications of artificial intelligence have irreversibly shifted from generation and dialogue in the digital realm to perception applications in the physical realm.

Last year, I visited a power company in a coastal province under the State Grid. As soon as I entered their substation workshop, I felt a strong industrial atmosphere. In accordance with the full - cycle requirements of the power grid, including power generation, transmission, transformation, distribution, and consumption, the power grid is integrated with AI algorithms to achieve the integration of digital twin and multi - modal large models. This helps to improve the reliability and operational efficiency of power grid optimization, smart experience, ice - covering early warning, as well as intelligent metering operation and maintenance and fault diagnosis.

From the construction of cloud intelligent agents at the physical layer, from the cloud brain to the edge terminals, the fact is that the form of AI intelligence is leaping out of the mobile phone screen and is increasingly embedded in the operation of the physical world.

Has the "ChatGPT Moment" of AI Arrived?

If in the past, AI was good at predicting the next word in the digital world, now it has shifted to predicting and shaping the next state of the world in the physical world. Jensen Huang calls this the "ChatGPT moment" of AI.

This is a popular concept in the tech circle recently: Physical AI. Its characteristic is that it is an intelligent system that can understand physical laws, interact with the real environment, and make changes, realizing a new scientific research paradigm of "hypothesis - AI simulation - experimental verification." This is expected to become the most imaginative driving engine for this new industrial revolution.

But to be honest, the breakthrough of Physical AI may be more difficult. The industry consensus is that five to ten years of in - depth exploration may only be the beginning.

This leads to the core development logic of Physical AI. It doesn't collect large - scale data symbols like language models. Simply put, it is both an AI and an AI trainer.

You can understand it this way: An excellent language model needs a large amount of text corpus to learn grammar, logic, and knowledge associations. A reliable Physical AI, on the other hand, needs a large amount of physical interaction corpus to internalize the operating laws of the world. It needs to know that if you push a cup placed on the edge of a table, it will probably break. It also needs to experience that when walking on a smooth tiled floor and on soft sand, the driving strategy of the knee joint should be fundamentally different. This kind of "knowing" and "experiencing" cannot rely solely on annotation but must rely on "experience" - whether virtual or real.

Therefore, the development path of Physical AI presents an interesting "stratification - integration" spiral. Traditional robotics uses a hierarchical architecture (perception, planning, control), which is clear and modular. Many domestic robot companies can quickly implement applications in scenarios such as warehousing and inspection, thanks to this mature engineering paradigm. However, its "ceiling" is also obvious: the information loss and delay between modules will make it seem clumsy when facing a dynamic and unknown environment.

The current trend is to achieve a more extreme "end - to - end" approach - making AI, like autonomous driving, directly map visual input to action output. This is very "brain - like" and unified. However, the complexity of the physical world and the requirements for safety make this path full of obstacles - where does the data come from? How to ensure safety? A wrong output may mean real collisions and damage. This is not like text generation, where you can start over if you make a mistake.

It is in this dilemma that the concept of the "world model" has been pushed to the forefront. It allows intelligent agents to conduct rapid and low - cost deduction and trial - and - error in their "minds" before taking real actions. This sounds ideal, but building a general physical world model that is realistic enough and can be calculated efficiently is currently extremely challenging.

Currently, it shows potential mainly in specific closed - loop fields (such as robots operating specific objects). We need both the common sense and generalization ability provided by the "world model" and have to rely on the reliability and controllability of the hierarchical architecture. In the future, the mainstream is likely not one replacing the other, but a hierarchical decision - making based on the world model - the brain is responsible for imagination and planning, and the cerebellum and spinal cord are responsible for reflexes and stability, etc.

The industry is using a series of "resource - expansion and cost - reduction" measures to improve efficiency. The first one is "synthetic data." In high - fidelity physical simulation engines (such as NVIDIA's Isaac Sim and the open - source MuJoCo), we can generate almost infinite data at zero marginal cost - allowing thousands of virtual robots to train day and night in virtual factories. However, there is a well - known "simulation - to - reality" gap: no matter how realistic the virtual world is, there are always subtle differences in physical parameters between the virtual and real worlds. A robot that can walk quickly in the simulation may "fall over" instantly in the real world.

Therefore, another "resource - expansion" idea has attracted much attention: using daily human videos for pre - training. Countless first - person videos of daily life and work on YouTube contain a large amount of information about object attributes, physical common sense, and operation skills. Letting AI models watch these videos in large quantities allows them to learn basic physical common sense such as "what things are fragile" and "how doors are usually opened" in an unsupervised manner. This has become a shortcut to bridge the simulation gap and inject human prior knowledge.

For example, cutting - edge explorations such as NVIDIA's GROOT model are practicing a hybrid model of "human video pre - training + simulation fine - tuning + real - machine fine - tuning." This may imply the future solution to the data problem of Physical AI: a "trinity" data ecosystem composed of human experience, virtual simulation, and real - world interaction.

Chinese Scenario: Between "Cost - Effectiveness" and "Strategic Depth"

When we look at the global competition landscape, the development of Physical AI presents an interesting contrast. The United States still leads the way in basic algorithms, chip architectures, and cutting - edge explorations, and its path is full of science - fiction elements and originality. China's path, on the other hand, is deeply engraved with its own industrial genes and emphasizes the implementation of engineering in specific scenarios.

This realism is first reflected in cost - effectiveness. China's advantage lies in quickly engineering and productizing cutting - edge technologies and controlling costs within an acceptable range for the market, relying on the world's most complete and fastest - responding supply chain. Just like the substation scenario mentioned at the beginning, the technology integration, cost control, and deployment efficiency behind it are the keys to the implementation of China's Physical AI. It may not be the first to invent a certain algorithm, but it is often the first to stably and inexpensively apply it in factory assembly lines, logistics warehouses, or power grid inspections. This ability is a powerful market penetration force in the early stage of Physical AI moving from the laboratory to various industries.

Secondly, there is "strategic depth." Different from some countries that tend to focus on basic research and free - market exploration, China provides clear application scenarios and industrial channels for Physical AI through top - level design. The "Artificial Intelligence +" initiative and "Embodied Intelligence" are written into the government work report, which means that a series of large - scale, complex, and demand - driven "training fields" and "experimental fields" have been systematically opened up, from smart power grids to smart agriculture, from flexible production lines to urban management. Our goal is specific: to achieve a 70% penetration rate of intelligent terminals by 2027. This sets a realistic coordinate system full of imagination for the evolution of Physical AI.

Of course, this path also has its challenges. Will over - emphasizing application and cost - effectiveness weaken long - term investment in more fundamental and revolutionary original technologies? How to balance the efficiency of "concentrating resources to achieve major goals" and the vitality of "grass - roots" innovation? These are issues that need continuous consideration.

The Long Road to "Generalization"

The ultimate dream of Physical AI is "generalization" - an intelligent agent can quickly adapt to new environments and tasks it has never seen before, just like a human being. But we may be farther from this goal than we think. A real - world problem we must face now is that there is no single - point breakthrough in industry technology applications. There are only continuous and difficult small breakthroughs in every aspect, such as perception, control, planning, materials, and energy.

When an intelligent agent that can easily lift hundreds of kilograms moves autonomously among people, any decision - making mistake can have physical consequences. Therefore, interpretability, safety redundancy, and ethical norms, which were partially put aside in the digital AI era, will become insurmountable lifelines in the Physical AI era.

From predicting words to predicting the state of the world, artificial intelligence is truly starting to break away from the virtual cradle and trying to use mechanical hands to touch and shape the future reality of humanity.

2026 will not be an end, but it can be regarded as an important milestone - the landing of AI has begun.

This article is from the WeChat official account "Sice Think Tank," written by Zhang Zijiong, and published by 36Kr with authorization.