2026: AI begins to "take root physically"
In an era where the technology iteration cycle is constantly shrinking, we always seem to be looking for a "turning year."
As we entered 2026, the most obvious change is that the mainstream applications of artificial intelligence have irreversibly shifted from digital generation and dialogue to physical perception applications.
Last year, I visited a power company in a coastal province under the State Grid. As soon as I entered their substation workshop, I was struck by the strong industrial atmosphere. According to the full - cycle requirements of the power grid, including power generation, transmission, transformation, distribution, and consumption, the power grid is integrated with AI algorithms to achieve the integration of digital twin and multi - modal large models. This helps to improve the reliability and operational efficiency of power grid optimization, smart experience, icing warning, as well as intelligent metering operation and maintenance and fault diagnosis.
From the construction of cloud intelligent agents at the physical layer, from the cloud brain to edge terminals, the fact is that the form of AI intelligence is emerging from mobile phone screens and increasingly embedded in the operation of the physical world.
Has the "ChatGPT Moment" of AI Arrived?
If the previous AI was good at predicting the next word in the digital world, now it focuses on predicting and shaping the next state of the physical world. Jensen Huang calls this the "ChatGPT moment" of AI.
This is a popular concept in the tech circle recently: Physical AI. Its characteristic is that it is an intelligent system that can understand physical laws, interact with the real environment, and make changes, realizing a new scientific research paradigm of "hypothesis - AI simulation - experimental verification." It is expected to become the most imaginative driving force for this new industrial revolution.
To be honest, the breakthrough of Physical AI may be more difficult. The industry consensus is that five to ten years of in - depth exploration may only be the beginning.
This leads to the core development logic of Physical AI. It is not like language models that collect large - scale data symbols. Simply put, it is both an AI and an AI trainer.
You can understand it this way: An excellent language model needs a large amount of text corpus to learn grammar, logic, and knowledge associations. A reliable Physical AI, on the other hand, needs a large amount of physical interaction corpus to internalize the operating laws of the world. It needs to know that pushing a cup on the edge of a table hard will probably result in it breaking; it also needs to understand that the knee joint driving strategy should be fundamentally different when walking on a smooth tile floor and on soft sand. This kind of "knowing" and "understanding" cannot be achieved only through annotation but must rely on "experience" - whether virtual or real.
Therefore, the development path of Physical AI presents an interesting "stratification - integration" spiral. Traditional robotics uses a hierarchical architecture (perception, planning, control), which is clear and modular. Many domestic robot companies can quickly implement applications in scenarios such as warehousing and inspection, thanks to this mature engineering paradigm. However, its "ceiling" is also obvious: the information loss and delay between modules make it seem clumsy when facing dynamic and unknown environments.
The current trend is to achieve a more extreme "end - to - end" approach - allowing AI to directly map visual input to action output, just like in autonomous driving. This is very "brain - like" and unified. However, the complexity and safety requirements of the physical world make this path full of obstacles - where does the data come from? How can safety be guaranteed? A wrong output may mean real collisions and damage, which is different from text generation where mistakes can be corrected.
It is in this dilemma that the concept of the "world model" has come to the forefront. It allows intelligent agents to conduct fast and low - cost deductions and error - checking in their "minds" before taking real actions. This sounds ideal, but building a general physical world model that is realistic enough and can be calculated efficiently is currently extremely challenging.
Currently, it shows potential mainly in specific closed domains (such as robots operating specific objects). We need both the common sense and generalization ability provided by the "world model" and the reliability and controllability of the hierarchical architecture. In the future, the mainstream is likely not one replacing the other but a hierarchical decision - making based on the world model - the "brain" is responsible for imagination and planning, and the "cerebellum" and "spinal cord" are responsible for reflexes and stability.
The industry is using a series of "resource - opening and cost - saving" methods to improve efficiency. The first is "synthetic data." In high - fidelity physical simulation engines (such as NVIDIA's Isaac Sim and the open - source MuJoCo), we can generate almost infinite data at zero marginal cost - allowing thousands of virtual robots to train day and night in virtual factories. However, there is a well - known "simulation - to - reality" gap: no matter how realistic the virtual world is, there are always subtle differences in physical parameters between the virtual and real worlds. A robot that can walk smoothly in the simulation may "fall over" instantly in the real world.
Therefore, another "resource - opening" idea has attracted much attention: using daily human videos for pre - training. Countless first - person view videos of daily life and work on YouTube contain a large amount of information about object properties, physical common sense, and operation skills. Allowing AI models to watch these videos in large quantities can enable them to learn basic physical common sense such as "what things are fragile" and "how to open a door" in an unsupervised way. This has become a shortcut to bridge the simulation gap and inject human prior knowledge.
For example, cutting - edge explorations such as NVIDIA's GROOT model are practicing a hybrid model of "human video pre - training + simulation fine - tuning + real - machine fine - tuning." This may imply the future solution to the data problem of Physical AI: a "trinity" data ecosystem composed of human experience, virtual simulation, and physical interaction.
Chinese Scenarios: Between "Cost - Effectiveness" and "Strategic Depth"
When we look at the global competition landscape, the development of Physical AI presents an interesting contrast. The United States still leads the way in basic algorithms, chip architectures, and cutting - edge explorations, with a path full of science - fiction elements and originality. In contrast, China's path is deeply engraved with its own industrial genes, emphasizing the implementation of engineering in specific scenarios.
This realism is first reflected in cost - effectiveness. China's advantage lies in quickly engineering and productizing cutting - edge technologies and controlling costs within an acceptable market range, relying on the world's most complete and responsive supply chain. As in the substation scenario mentioned at the beginning, the technology integration, cost control, and deployment efficiency behind it are the keys to the implementation of China's Physical AI. It may not be the first to invent a certain algorithm, but it is often the first to stably and inexpensively apply it in factory assembly lines, logistics warehouses, or power grid inspections. This ability is a powerful market penetration force in the early stage of Physical AI moving from the laboratory to various industries.
Secondly, there is "strategic depth." Different from some countries that focus on basic research and free - market exploration, China provides clear application scenarios and industrial channels for Physical AI through top - level design. The "Artificial Intelligence +" initiative and "Embodied Intelligence" are written into the government work report, which means that a series of large - scale, complex, and demand - driven "training fields" and "experimental fields" have been systematically opened up, ranging from smart power grids to smart agriculture, from flexible production lines to urban management. Our goal is specific: to achieve a 70% penetration rate of intelligent terminals by 2027. This sets a realistic and imaginative coordinate system for the evolution of Physical AI.
Of course, this path also has its challenges. Will over - emphasizing application and cost - effectiveness weaken long - term investment in more fundamental and revolutionary original technologies? How can we balance the efficiency of "concentrating resources to achieve major goals" and the vitality of "grass - roots" innovation? These are issues that need continuous consideration.
The Long Road to "Generalization"
The ultimate dream of Physical AI is "generalization" - an intelligent agent can quickly adapt to new environments and tasks it has never seen before, just like a human. However, we may be farther from this goal than we think. A real - world problem we must face now is that there is no single - point breakthrough in industry technology applications. Instead, there are only continuous and difficult incremental breakthroughs in every aspect, such as perception, control, planning, materials, and energy.
When an intelligent agent that can easily lift hundreds of kilograms moves autonomously in a crowd, any decision - making error can have physical consequences. Therefore, interpretability, safety redundancy, and ethical norms, which were partially ignored in the digital AI era, will become insurmountable lifelines in the Physical AI era.
From predicting words to predicting the state of the world, artificial intelligence is truly starting to break free from the virtual cradle and is trying to use mechanical hands to touch and shape the future reality of humanity.
2026 is not an end, but it can be regarded as an important milestone - the landing of AI has begun.
This article is from the WeChat official account "Sice Think Tank." Author: Zhang Zijiong. Republished by 36Kr with permission.