HomeArticle

Digging Deep into the "Digital Foundation" of Physical AI: The 100 Billion Dream of May Vision

晓曦2026-04-29 16:20
The king of Chinese simulation scenarios has been selected by NVIDIA's intelligent driving ecosystem.

In the Beijing Auto Show in 2026, even in the bustling center of the exhibition hall, automotive industry practitioners could sense a subtle and anxious atmosphere. Two years ago, the focus of the auto show was on the large - scale implementation of intelligent driving in cities and the industry entering the second half of the electrification era. Today, on the booths of major mainstream automakers and suppliers, two core concepts - end - to - end and embodied intelligence - can be seen almost everywhere.

The dimension of the intelligent driving competition has changed. Automakers are no longer satisfied with simply increasing the number of lidars. Instead, they start to repeatedly demonstrate on the large screens at the booths the "human - like sense" of algorithms in extreme scenarios such as rainy - night intersections and chaotic movements of food - delivery vehicles. This change indicates that the competition in autonomous driving has officially shifted from the perception era of "seeing" to the cognitive and reasoning era of "understanding".

However, beneath the prosperous scene, the anxiety of "idle computing power" is spreading within the industry.

In the logic of intelligent driving development, chips are the "flour" and data are the "water". Only when the two are fully integrated and go through thousands of times of adjustment and "kneading" can they ferment into the "bread" that supports the evolution of algorithms. In the past two years, many automakers have tried every means to hoard chips and expand intelligent computing centers, attempting to build competitiveness with a large amount of "flour". But they soon found that even with a large amount of flour, without high - quality "pure water" that can trigger causal logic and comes with physical annotations, the utilization rate of computing power would be extremely low. Many automakers are facing the embarrassment of "not being able to run, running smoothly, or running at full capacity". The computing power is idling in expensive cabinets and cannot be converted into real driving capabilities.

This embarrassment essentially lies in the "unavailability" of real - world data. Automakers obtain PB - level data every day through real - collection fleets, but this data can often only be replayed and cannot generate interactions. It's like a pre - recorded movie video. No matter what decisions the algorithm makes, the plot in the video will not change. This data lacking physical feedback cannot generate the effective training experience that the algorithm most needs.

In order to find "effective data", the industry has quickly reached a new consensus: Intelligent driving simulation is no longer an optional testing tool, but the core foundation of the data closed - loop in the era of physical AI. Against this background, the World Model has changed from a term in cutting - edge laboratories to a must - have option for the entire industry.

The competition in the era of physical AI has evolved into a battle for scenarios and technologies. Whoever can build a virtual world that follows physical causal laws, is interactive, and can be generalized will master the "distribution right" of computing power implementation. It is at this critical moment that Wuyi Vision officially released its SimOne 4.0, which has evolved through nine years of product iterations. This is an end - to - end intelligent driving simulation platform reconstructed for the era of physical AI. Wuyi Vision's logic is very straightforward. By building the native foundation of the world model, it provides a set of "physical engines" for automakers with a large amount of computing power. This engine can accurately convert the idling computing power into virtual training assets that follow real physical laws and have causal interaction capabilities, allowing AI to complete the most difficult "last mile" in the virtual laboratory before entering the real physical world.

China's "Scenario King" Enters NVIDIA's Intelligent Driving Ecosystem

In the highly competitive physical AI track, why is Wuyi Vision, a company, the irreplaceable intersection point with a 53.5% market share? The answer lies in an official announcement from NVIDIA and the mass - production schedules of Chinese automakers.

This irreplaceability first comes from NVIDIA's endorsement. At the NVIDIA GTC Conference in 2026, Wuyi Vision was the only Chinese company in the list of partners in the field of high - level intelligent driving simulation.

In the view of Wu Xinzhou, the global vice - president of NVIDIA, autonomous driving has officially entered the physical AI stage. The previous perception AI answered what the world is like now, while physical AI needs to answer how the world will change next and how the world will react after the vehicle takes action. Therefore, NVIDIA has disassembled its DRIVE full - stack assisted driving platform into a "five - layer cake" production system.

In this system, DRIVE Hyperion, as a vehicle - end reference platform for mass production (equipped with the Thor chip, sensor suite, and software stack), is responsible for real - vehicle deployment. Alpamayo is its end - to - end autonomous driving model. The key infrastructure supporting the training and simulation closed - loop is the Omniverse simulation platform (including the NuRec neural reconstruction component) and the Cosmos world basic model platform. However, although NVIDIA has top - level chips and basic models, when facing China's extremely complex local road conditions, unstructured traffic scenarios, diverse traffic participants, and in - depth engineering implementation, it urgently needs a partner who understands Chinese scenarios.

This explains why Wuyi Vision has become NVIDIA's only Chinese partner in the field of high - level intelligent driving simulation officially announced at GTC. The logic of their technical complementarity is very clear. NVIDIA provides the leading NuRec neural reconstruction technology, which solves the problem of how fast real - world scenarios can be digitized. Wuyi Vision complements scene editing, random generalization, complex traffic behavior modeling, and high - precision dynamic models. Their precise technical complementarity has solved a "deadlock" in the intelligent driving industry: the non - interactivity of real - collected scenario data. Previously, the data collected by fleets was like a DV video that could only be fast - forwarded and rewound. Now, through the combined efforts of their technologies, this video has become a 3D scene that can be interacted with in real - time and edited at will.

Cooperation between 51Sim and NVIDIA's intelligent driving simulation products

A deeper business intention is that Wuyi Vision is the leading enterprise with a 53.5% market share in China's high - level intelligent driving simulation market. In fact, it has become the "customer converter" for NVIDIA's computing power ecosystem. When Chinese automakers adopt Wuyi Vision's toolchain to achieve the "scalable and replicable autonomous driving production system" described by Wu Xinzhou, this demand will naturally lead to NVIDIA's computing power infrastructure.

Wu Xinzhou presented a counter - intuitive data: The global autonomous driving mileage accounts for only 0.006% of the total driving mileage. This means that large - scale application has not been widely opened up, and the engineering threshold of safety verification is the main obstacle. For automakers on the eve of mass production, Wuyi Vision is not only a technology supplier but also the "infrastructure" to ensure the safe mass production of intelligent driving systems. In the new version of the "Review Requirements for the Access of Road Motor Vehicle Production Enterprises", simulation verification ability is clearly listed as the core indicator to measure an enterprise's intelligent driving ability and is written into the access review requirements as a key element.

This policy orientation has highlighted the value of the "pass" that Wuyi Vision has accumulated over the years. Currently, Wuyi Vision has achieved 100% cooperation coverage with six major national - level laboratories and authoritative testing institutions in China. This means that when an automaker's intelligent driving system is ready for mass production and needs to pass the national - level access test, the verification system based on 51Sim is almost the industry - wide "yardstick". Using it means being in line with the authoritative standards, which can greatly reduce communication and compliance costs.

By joining hands with Wuyi Vision, automakers can also activate their dormant data assets. As we all know, the road - test fleets established by automakers at a huge cost generate an amazing amount of data every day, but the asset utilization rate is extremely low. SimOne 4.0 plays the role of "turning stone into gold". It can convert the original, static video - stream data into runnable and expandable virtual assets, turning "dead data" back into "living scenarios" and magnifying the value of each kilometer of road test by a hundred times.

In addition, the deep - embedded process brings extremely high migration costs. After nine years of industrial accumulation, the algorithm logic of 51Sim has been deeply embedded in the underlying R & D processes of many automakers. In 2026, a crucial year for the explosion of intelligent driving, for automakers, replacing the simulation foundation is not simply replacing software; it means starting the entire verification logic from scratch. In the delivery battle where every second counts, no automaker is willing to take such a huge time risk at this time.

The ecological niche value of Wuyi Vision is also reflected in its ability to undertake the overflow of domestic computing power. For a long time, domestic GPUs have faced a pain point: they have computing power but lack large - scale and high - frequency commercial application scenarios. Many chips have amazing indicators in the laboratory but "maladapt" once they enter the real operating environment.

The turning point in this situation comes from a long - term ecological layout. As early as 2021, Moore Threads became a shareholder of Wuyi Vision through investment. This dual binding of "equity + business" has enabled SimOne 4.0 to complete the full - link systematic adaptation with domestic GPUs such as Moore Threads' MTT S5000 at the beginning of its development, providing a real, high - frequency, and implementable "training ground" for domestic chips. This full - link connection from perception mining to simulation reasoning not only verifies the productivity value of domestic computing power but also helps it seize the first - mover advantage in the physical AI track in the wave of domestic substitution.

Moore Threads' developers announce the cooperation with 51Sim

From Drawing Tools to Native Foundation, Reconstructing "Physical Intuition"

Behind the in - depth penetration from the ecological niche to development work is always the leading position in technological generations. A typical example representing Wuyi Vision's technological capabilities is SimOne 4.0. From the perspective of technological evolution, the release of SimOne 4.0 marks a leap of the simulation platform from a tool to a foundation. The early version of SimOne 1.0 was more like a beautiful drawing tool, while the current 4.0 has evolved into the native foundation in the era of physical AI.

Wu Xinzhou put forward an extremely important judgment that the world model is the most essential part of autonomous driving. In his view, the VLA model solves the reasoning link from vision and language to action, while the world model complements scenario evolution, action consequences, and physical feedback. This explains the most core evolution of SimOne 4.0: Built - in physical intuition enables AI not only to "see" the picture but also to "understand" causality in the virtual environment.

In the world built by SimOne 4.0, the broken trajectory of objects after impact and the wet - slip coefficient of the road surface on rainy days are not pre - set animations but the results of strict deductions following real physical laws. This in - depth restoration of physical laws provides a "training ground" with logical consistency for AI, allowing it to complete cognition in the virtual laboratory before entering the real world.

To achieve this physical intuition, SimOne 4.0 has carried out a full - link reconstruction in its technical architecture. The first step is to revive the dormant "dead data". In the traditional model, the data collected from road tests can often only be used as playback materials and lacks interactive value. The data layer of 4.0 builds a "reconstruction + generation" system, deeply integrating neural rendering and 4DGS technology, and can convert road - test videos into editable and runnable simulation assets with one click. For example, a real accident scenario that occurred at a highway intersection yesterday can be "cloned" into the system today and derive countless variant scenarios for AI to repeatedly practice how to avoid risks.

Cooperation between 51Sim and NVIDIA's intelligent driving simulation products

With the revitalization of data assets, the efficiency of large - scale training has become the key to evolution. The training layer of SimOne 4.0 has established a standardized data processing and scheduling system, supporting high - concurrent task execution of multi - type GPU architectures. It is worth noting that it has completed in - depth adaptation and optimization with Moore Threads' MTT S5000, proving that domestic computing power can not only be used but also support ultra - large - scale world model training tasks. This large - scale expansion of computing power ensures that the world model and the VLA model can complete iterations at an extremely fast speed. In essence, it is constantly expanding the "general cognitive bandwidth" of AI, that is, allowing AI to establish common sense about the physical world like humans through high - frequency virtual training, so as to handle complex scenarios more calmly and accurately.

Cooperation between 51Sim and Moore Threads' intelligent driving simulation products

After having a large - scale training foundation, SimOne 4.0 shows a certain "future - predicting" reasoning ability. Different from the traditional simulation that highly depends on manual modeling paths, its reasoning layer can automatically generate dynamic environments based on real data. As the behavior of the Agent changes in the environment, the system will independently generate complex interaction processes, making the scenario no longer a static background but a living world that constantly generates new states with the model's behavior. This dynamic evolution ability enables AI to break the "simulator feeling" at the verification layer. Through the high - fidelity simulation system driven by real data, the system eliminates the "credibility gap" between the virtual and the real, making the verification results highly valuable for reference and allowing the trial results of AI in the virtual world to be applied to the development of mass - produced vehicles without loss.

Finally, these technologies are transformed into engineering implementation capabilities through the delivery layer. SimOne 4.0 can adapt to different computing power architectures and operating environments, providing full - cycle support from R & D to mass - production verification. Currently, this system has empowered more than a hundred customers including intelligent driving, robots, and intelligent equipment, and has implemented a highly reliable verification system in various complex projects.

The trump card supporting this entire logical closed - loop is the physical intuition that Wuyi Vision repeatedly emphasizes in the industry. In terms of specific indicators, the PSNR index of the 51World Model in digital twin scenario simulation reaches over 35dB, significantly higher than the industry's general level of 30dB. At the same time, the simulation confidence of its cameras and lidars exceeds 92%, the dynamic simulation confidence exceeds 95%, and the annotation accuracy of synthetic data has exceeded 99.9%. These data, which are ahead of the