Knocking on the Door of L3: The Future Direction of the New Decade in Intelligent Assisted Driving

This path has no end but is a new starting point for humans to coexist with intelligence.

At the inception of any technology, debates about its development path always abound, and intelligent assisted driving technology is no exception.

Should we choose the incremental development from Level 2 to Level 4, or the leapfrog development straight to the Level 4 endpoint? Should we follow the ultimate simplification of the pure vision technology route, or adhere to the redundant safety of multi - sensor fusion? How can we balance the standardization of industry standards and the customization of automobile manufacturers? Where will the game between vertical integration and open alliances lead?

These unresolved propositions precisely prove the strong vitality of technological innovation. Through continuous exploration and practical verification, the future evolution map of intelligent assisted driving is gradually becoming clear.

Computing Power Leap under the Three - Level Architecture

In the past decade, the computing power of intelligent assisted driving systems has achieved a qualitative leap from quantitative change. The core support lies in the in - depth implementation of the three - level architecture of "cloud training - edge inference - vehicle - end execution". By migrating the computing load from the resource - limited vehicle end to the cloud, the potential of exponentially growing computing power is unleashed.

As the "super brain" of intelligent assisted driving, the cloud has become the core carrier for model training, data closed - loop, and algorithm iteration. In the fields of end - to - end model training and Corner Case scenario mining, the scale of cloud computing power directly determines the evolution speed of intelligent assisted driving systems.

Globally, Tesla's cloud computing power has exceeded 88.5 EFLOPS. Among domestic players, Geely Xingrui Intelligent Computing Center leads with 23.5 EFLOPS, Huawei has exceeded 10 EFLOPS, Li Auto and Xiaomi have both reached 8.1 EFLOPS, and XPeng plans to increase its cloud computing power from 2.51 EFLOPS to 10 EFLOPS in 2025 to build a more powerful training base.

As the "edge terminal" for real - time decision - making, the computing power of the vehicle end is exponentially increasing towards the level of thousands of TOPS. Level 2 tasks such as automatic parking and urban NOA require dozens to hundreds of TOPS, while Level 3 and above levels need to break through the 1000 - TOPS threshold to meet the real - time inference requirements of end - to - end models.

In the current chip matrix, NVIDIA's Thor - X - Super chip is ready to go, and its 2000 - TOPS computing power will reshape the performance benchmark. NIO's NX9031 has achieved a breakthrough in 1000 - TOPS computing power, XPeng's Turing chip follows closely with 750 TOPS, and Huawei's Ascend 910 and Horizon's J6P respectively build mid - range computing power barriers with 512 TOPS and 560 TOPS.

In popular intelligent assisted driving solutions, the vehicle - end computing power of NIO's NIO Pilot reaches 1016 TOPS; Tesla's FSD follows with 720 TOPS; XPeng's XNGP, Li Auto's AD PRO, Xiaomi's Xiaomi Pilot Max, BYD's Tian Shen Zhi Yan A, and Zeekr's Qian Li Hao Han H7 are comparable, all at 508 TOPS. In comparison, the vehicle - end computing power of Huawei's ADS3.0 is slightly inferior, about 200 TOPS.

As the "nerve node" of vehicle - cloud collaboration, edge computing power undertakes the important task of real - time collaboration of data from the vehicle end, roadside, and cloud. Its standardization process directly affects the implementation efficiency of Level 3 and above intelligent driving. By improving the accuracy of local environment perception and the response speed of traffic optimization, edge computing power is becoming an indispensable technological foundation for high - level intelligent driving.

High - Level Intelligent Assisted Driving Moves towards "Technological Equalization"

Intelligent assisted driving in China is continuously breaking through along the gradient of scenario complexity: from basic functions such as automatic lane - changing and ramp passing on structured roads by high - speed NOA, to tackling complex urban scenarios such as traffic light recognition and unprotected left - turns by urban NOA, and finally achieving full - scenario door - to - door (D2D) connectivity, constructing a full - link intelligent driving system covering "underground garage - urban area - highway", and integrating ultimate scenarios such as automatic charging and multi - floor parking.

This evolution trajectory not only marks a fundamental shift in the technological paradigm from "rule - driven" to "data - driven", but also means a value leap of high - level intelligent driving from "specific - scenario assistance" to "full - journey intelligent empowerment". Its core driving force comes from the ternary collaborative breakthrough of algorithms, data, and computing power.

On the computing power side, a quantitative leap is achieved through the three - level architecture. On the data side, the bottleneck of long - tail scenarios is broken through by the dual - wheel drive of real and synthetic data. On the algorithm side, the evolution from "rule engine + module stacking" to "data engine + end - to - end integration" completes the optimization of the entire "perception - decision - control" link.

In 2025, the trend of "technological equalization" of high - level intelligent driving has significantly accelerated. BYD has extended the high - speed NOA function to models priced at 80,000 yuan, XPeng has covered the urban NOA to the 150,000 - yuan market, and other mainstream automobile manufacturers have also extended the urban NOA function to models priced between 100,000 and 200,000 yuan. The technological equalization of high - level intelligent assisted driving is becoming more and more obvious.

The realization of technological equalization of high - level intelligent assisted driving is the result of the resonance of software efficiency improvement, hardware cost reduction, and large - scale production expansion. Driven by technological equalization, the market scale and penetration rate of Level 2 and Level 2+ high - level intelligent assisted driving have shown explosive growth. It is estimated that the penetration rate of Level 2+ will jump from 8% in 2024 to 15% in 2025, and the high - level intelligent driving market for passenger cars is experiencing an explosive growth cycle.

Data Closed - Loop Promotes "Cost Reduction" of Technology Application

Currently, intelligent assisted driving systems are undergoing a paradigm shift from "hardware redundancy stacking" to "algorithm - defined perception", and the data closed - loop ecosystem has become the core carrier of this transformation. Once the multi - modal sensors on the vehicle are activated for data collection, the gears of this "data machine" start to turn, engage, and operate in coordination.

First, the data collected by sensors is standardized in format on the vehicle end, pre - processed through caching, and then automatically annotated with metadata according to driving behavior, environmental parameters, and real - time labels of targets. When the labels meet specific conditions, event packaging is triggered.

After the data is transmitted to the edge end, the rule engine and lightweight model will make selections according to different precisions, and finally obtain high - value data related to intelligent driving. According to the algorithm, such data is compressed, optimized, and stored in different levels, and is classified into upload queues of different priorities according to the degree of urgency. In this process, the data will also be desensitized, and the transmission process will be monitored to ensure the highest compliance and safety.

Finally, a data closed - loop will be formed in the cloud. Here, the multi - source data transmitted from the edge end will complete the steps of warehousing, label and index establishment, and will be automatically cleaned according to the clustering algorithm to remove conflicting data.

Next, the 4D annotation toolchain and data synthesis tool will annotate and enhance the retained valid data. After that, through distributed model training, simulation verification and deployment, value evaluation, and compliance audit, the data will officially complete the closed - loop feedback and iteration.

With the continuous enhancement of edge and cloud computing power, the deep integration of real data and synthetic data has become the key path to break through high - level autonomous driving. The maturity of dynamic data distillation technology and multi - modal feature alignment algorithm is gradually reducing the system's dependence on hardware redundancy, and the scene generalization ability has been significantly improved.

This trend directly promotes a structural decline in the cost of core perception components. In the past year, the average price of lidars has dropped from 2,500 yuan to 1,200 yuan, a decrease of 52%. The prices of millimeter - wave radars, in - vehicle cameras, and ultrasonic radars have decreased by 25% - 31%, clearing the cost obstacles for the large - scale implementation of high - level intelligent driving.

Positioning technology is also evolving simultaneously, moving from the traditional mode that relies on lidar point - cloud matching and high - precision maps to a new stage of map - less and end - to - end models.

By fusing multiple cameras to generate a bird's - eye view to replace the pre - set high - precision maps, the map maintenance cost is significantly reduced. The cloud - based simulation environment constructed by massive vehicle - end data continuously strengthens the model's generalization ability for dynamic scenarios. The increase in the weight of visual perception further weakens the dependence on high - cost hardware such as lidars.

In the long run, map - less and end - to - end models are only transitional forms. When vehicle - road - cloud collaboration and self - evolving learning become the core of technology and breakthroughs are made in chip - level integration, the ultimate form will be the global dynamic positioning based on Ambient Intelligence and generative AI.

The Leap from "Modular" to "End - to - End"

Intelligent driving algorithms are undergoing a revolutionary evolution from modular stacking to end - to - end integration, and architectural innovation has become the core engine of technological breakthroughs.

The end - to - end (E2E) architecture relies on a single neural network to perform the entire "perception - decision - control" process and directly outputs vehicle control instructions. It does not require manual rule intervention. Through training with massive data, it can approximate human intuitive decision - making, and the response efficiency and adaptability to complex working conditions are significantly improved. However, the system's generalization ability is limited by the threshold of the automobile manufacturer's data scale.

To break through this limitation, the vision - language model (VLM) emerged as an enhanced module. It combines visual perception and natural language understanding abilities, analyzes complex traffic semantics, and generates decisions through chained reasoning. It forms a "fast - slow collaborative" decision - making combination system with the end - to - end architecture to improve the safety of long - tail scenarios and Corner Cases.

However, VLM consumes a large amount of computing power and cannot make intuitive responses as quickly as the end - to - end architecture. In addition, its high dependence on high - precision maps also increases the maintenance cost and reduces the generalization ability.

Therefore, intelligent driving algorithm developers have gone a step further on the multi - stage end - to - end architecture composed of "VLM + end - to - end" and designed a more powerful technology combination - the Vision - Language - Action model (VLA).

VLA is an advanced version of VLM. It integrates the action modality to build a unified "vision - language - action" model and completes the end - to - end closed - loop of "perception - decision - execution".

Compared with VLM and "VLM + end - to - end", VLA takes into account the action modality and forms a unified model that integrates vision, language, and action. It directly embeds multi - modal information into the driving decision - making chain, which significantly reduces the dependence on data while improving the generalization ability.

More importantly, its reasoning process can be derived throughout, and the reasoning process and working logic can be intuitively presented to users through the human - machine interaction interface, which enhances users' confidence in using it.

Although VLA has significant advantages, it still faces multiple challenges. The requirement to integrate the visual encoder, language encoder, and action encoder in the same architecture greatly increases the difficulty of engineering development.

At the same time, the high data demand and collection cost of this architecture lead to excessively high initial training costs, and the multi - modal perception has insufficient support for embodied abilities such as force feedback and physical interaction.

In addition, there is a contradiction between the real - time computing requirement and the current computing power of in - vehicle chips, and the risk of black - box decision - making reduces the interpretability of decisions and increases the difficulty of debugging, etc.

Therefore, before the R & D threshold and implementation cost of VLA are effectively controlled, the multi - stage end - to - end architecture composed of "VLM + E2E" remains the more inclusive and mainstream choice.

The World Model May Become the Engine of "Human - Like Intelligent Driving"

To a large extent, more advanced cloud algorithms will help reduce the training difficulty of the vehicle - end VLA architecture and quickly enhance its generalization ability. Currently, cloud algorithms themselves are also undergoing major technological innovations.

From data - driven imitation learning to the generative world model with the ability to model the physical world, the underlying logic of the development of cloud algorithms is very clear: based on generative AI technology, a mileage - closed - loop simulation system of hundreds of millions of kilometers is constructed by synthesizing virtual scenarios. The latter not only includes long - tail scenarios that appear in the real world, but also can use the spatio - temporal evolution prediction ability to simulate the environmental changes caused by specific actions of vehicles in unknown scenarios (such as extreme weather, traffic accidents, etc.).

This will significantly reduce the dependence on real - world driving data during the algorithm model training process. While continuously generating multi - modal training data, it can integrate an enhanced self - learning mechanism to ultimately optimize driving strategies with human - like behavior as the goal.

On the other hand, the continuously evolving vehicle - cloud collaborative distillation mechanism will effectively improve the generalization ability of the vehicle - end VLA algorithm architecture, and the real - time data flowing back from the vehicle end can feed back to the cloud model, help it synthesize long - tail scenarios, drive the completion of iteration, and finally form a cognitive evolution closed - loop with two - way enhancement.

What cannot be ignored is that the cloud - based world model highly integrates intelligent driving elements such as sensor data, traffic rules, and practical experience. Therefore, in essence, it is an AI framework that can understand, reason, and predict the driving environment.

With its advantages in perception and decision - making optimization, the cloud - based world model can provide semantic information to improve the environment recognition ability of intelligent driving systems, predict the behavioral intentions of surrounding traffic participants, and play an auxiliary role in decision - making planning and vehicle control. To some extent, its emergence is expected to replace the vehicle - end vision - language model and use cloud computing power to help the vehicle end form a more accurate scenario reasoning ability.

Currently, many industry players have begun to focus on cloud computing power. For example, Huawei's ADS 4.0 publicly claims that it has completed a simulation verification of L3 - level intelligent assisted driving

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Knocking on the Door of L3: Where Will the New Decade of Intelligent Assisted Driving Lead?

Computing Power Leap under the Three - Level Architecture

High - Level Intelligent Assisted Driving Moves towards "Technological Equalization"

Data Closed - Loop Promotes "Cost Reduction" of Technology Application

The Leap from "Modular" to "End - to - End"

The World Model May Become the Engine of "Human - Like Intelligent Driving"