From technological routes to personnel changes, why has the intelligent driving industry started creating "new buzzwords" again?
High-order intelligent driving, mapless NOA, end-to-end, VLA, WEWA architecture, NWM... Every few months, new terms emerge in automobile companies and the intelligent driving industry, corresponding to the rapid iteration of intelligent driving technologies.
However, the problem of overly rapid technological iteration has also emerged. New cars purchased by users a year ago can no longer be compatible with current new technologies; even users' understanding can no longer keep up with these new terms. There are also undercurrents within automobile companies. From rule-based to end-to-end, and then to world models and physical AI architectures, the intelligent driving departments of new car-making forces are always facing personnel changes and high-level executive departures.
The intelligent driving industry generally believes that from the fourth quarter of this year to the first half of next year is another critical period for the implementation of assisted driving technologies. With the upgrades based on world models and VLA, including automobile companies adhering to self-research and solution providers such as Momenta, DeepRoute.ai, and WeRide, the leading positions can change at any time.
From Rules, End-to-End to World Models
The rule-based assisted driving system's core modules include perception, prediction, planning, and control, commonly known as the "modular solution" in the industry. Its advantage lies in easy mass production, but the disadvantages are also obvious. The four independent modules work in series, resulting in relatively long time delays and large information losses. Therefore, vehicles are often constrained by their game-playing abilities and find themselves in difficult situations on the road.
In August 2023, Tesla launched the end-to-end-based FSD V12 beta version, and "end-to-end" also became a hot topic in the domestic intelligent driving circle. Huawei, XPeng, NIO, and Li Auto followed suit one after another, and solution providers such as Momenta also successively launched end-to-end solutions.
However, whether it is rule-based or end-to-end, in essence, they are all about the imitation and learning of the assisted driving system - feeding a large amount of human driving data into the system to let it learn how to drive. Therefore, processes such as data collection, annotation, and cleaning have emerged. The core is to enable the learning model to understand this data and then improve the efficiency and accuracy of learning.
Fundamentally speaking, this method is basically similar to the process of human beings learning to drive, but the main difference is that the system's early learning and later correction are both passive, rather than the active learning and active correction of human beings.
For example, when turning left at an intersection with two left-turn lanes, human drivers generally prefer to choose the lane with fewer vehicles, and the following distances also vary. However, rule-based and end-to-end assisted driving systems generally choose the innermost lane. Another example is that when a vehicle changes lanes into a ramp and encounters a traffic jam, human drivers can choose the right time to merge, while the system may easily cause the vehicle to stay in place.
Another problem is that a system without the ability to actively learn and correct cannot handle all possibilities. In the words of Liu Xianming, the person in charge of XPeng's Autopilot Center organization, "Even if we can solve 99% of corner cases every day, 99% today and 99% tomorrow, we still can't solve them all. Unless we can exhaust all possibilities, it will take ages to achieve L4. So this is an insoluble problem."
Li Xiang, the founder of Li Auto, explained that end-to-end has no understanding of the real physical world. It only receives three-dimensional images from the vision system and gives a motion trajectory based on the vehicle's speed. End-to-end is sufficient to handle most generalized scenarios, but it will encounter problems in particularly complex situations that it has never learned. This also shows from another aspect that since last year, the mass-produced end-to-end systems have all been two-stage, rather than one-stage.
To address the insoluble shortcoming of end-to-end, Li Auto added a VLM (Vision Language Model). However, since these models are open-source, their capabilities in traffic scenarios are very limited and can only play a very limited auxiliary role, such as being able to recognize the countdown of a red light and output motion signals in combination with the navigation map.
It has been proven that imitating human driving cannot help the assisted driving system break through to L3 because the act of imitation itself has drawbacks. It must have a clear object to imitate and needs to exhaust all possible imitation behaviors - it's like a nesting doll, with a smaller one always appearing.
"Since the path of imitation doesn't work, we should go back to the origin," said Liu Xianming. "Autonomous driving is not simply about imitation learning. Instead, we should re-understand the world and truly drive a car like a human being."
Li Xiang also expressed a similar view: The third stage, namely VLA (vision-language-action, visual language action model), uses a combination of 3D vision and 2D to view the real physical world and can even understand navigation software and its operating methods. In comparison, VLM only sees a picture. VLA also has its own brain system, which can understand the physical world it sees and truly execute driving actions like a human being based on its own language system, thinking chain, and reasoning ability.
Li Auto's VLA is also called the "VLA Driver Large Model", and its principle is to translate visual imaging into language and then execute actions. XPeng is even more radical. At the XPeng 2025 Technology Day on November 5th, He Xiaopeng announced that XPeng will directly eliminate the language translation process in the new generation of VLA models. The multi-modal physical signals captured by the cameras will be input, and continuous control signals will be directly output.
However, XPeng's second-generation VLA also has an inference process, but it is hidden in the model and not carried out by an explicit language model. Eliminating the "L" (meaning language) has two advantages. Firstly, it improves the simplicity and efficiency of the model's operation and reduces the loss in the information transmission process. The signals provided by the video and IMU (spatial attitude sensor) can directly output continuous control signals without going through the language translation process. Secondly, the system has the possibility of large-scale "self-supervised learning". The videos collected by the vehicle from the physical world can directly become training data, and the system thus has extremely strong generalization ability.
Liu Xianming said, "In any overseas market, XPeng no longer needs to re-map and annotate data. As long as there are XPeng cars on the road, it can support the training of the model and quickly support deployment and implementation."
For each company, the routes chosen for the third stage of autonomous driving vary. Lang Xianpeng, the senior vice president of Li Auto's autonomous driving R & D, believes that Huawei is a strong player in the rule-based era. It's impossible to defeat Huawei with rules. The end-to-end approach, which followed the strategic trend, was originally a new technical route, but today it has become an old market. "If Li Auto wants to achieve true autonomous driving, it can't continue to fight in this battlefield. It needs to switch to a new one, which is VLA."
In March this year, Li Auto released its VLA technical solution. Since then, discussions about the feasibility of VLA have gradually emerged. Jin Yuzhi, the CEO of Huawei's Intelligent Automotive Solution BU, believes that Huawei will not follow the VLA technical route because the approach of converting video into language tokens and then controlling the vehicle in VLA is "a shortcut". Huawei's WAWE architecture is similar to XPeng's second-generation VLA, which also omits the language process and directly controls the vehicle through multi-modal information such as vision, sound, and touch.
Wu Yongqiao, the president of Bosch's Intelligent Driving and Control Systems Division in China, mentioned four difficulties in the implementation of VLA: It is very difficult to align multi-modal features; it is very difficult to extract multi-modal training data; large language models have inevitable "hallucinations"; the current storage bandwidth of intelligent driving chips is not specifically designed for large models and cannot support large amounts of data transmission and calculation.
NIO, like Huawei, has chosen the world model route. Ren Shaoqing, the chief expert and vice president of NIO's autonomous driving R & D, also holds a similar view. He believes that VLA ties language and actions together and still centers around language. The bandwidth of the language model is insufficient to handle the complexity and continuity of the real world.
In Liu Xianming's words, there may be a problem of information loss in the alignment of multi-modal features in the VLA model. In a nearly 6-second video clip shown by XPeng, it contains a lot of visual information, road conditions, and vehicle motion information. If the VLA model is used, it means that this information needs to be first converted into a language text, and alignment is the process of ensuring the optimal accuracy of the conversion. "But in fact, even if we use thousands of words to describe it, there will still be information loss compared to the direct video presentation. This is the so-called 'what you see is not what you get'. We believe that eliminating the language process is the simplest and most straightforward way," said Liu Xianming.
However, no matter which route it is, they all point to "high computing power, big data, and large models". In XPeng's second-generation VLA, which is planned to be delivered to Ultra version owners, the supported computing power is as high as 2250 TOPS, provided by three self-developed Turing AI chips. NIO has also self-developed chips. In NIO's world model, Ren Shaoqing strongly advocates adding a reinforcement learning model. He believes that this is the key to upgrading short-term memory imitation learning to handle long-sequence intelligent agents.
Before joining NIO, Ren Shaoqing and Cao Xudong founded the autonomous driving company Momenta in 2016. When he chose to join NIO to lead the autonomous driving development in 2020, during NIO's trough period, it caused quite a stir. He recognizes the driving effect of large-scale, high-quality data on the transformation of artificial intelligence technologies, which is also the core reason for him to leave Momenta and join NIO.
The Turbulent Internal Organization
The switch of autonomous driving technical routes actually started at the end of 2023. In November 2024, XPeng rescheduled its originally planned Technology Day on October 24th to November 7th. Before the Technology Day, the XPeng P7+ was launched. At that time, Li Liyun, who had just been promoted to vice president and was in charge of the Autopilot Center, was the main speaker for the release of the end-to-end technology.
However, XPeng was already simultaneously promoting two R & D routes internally - the traditional VLA and the innovative VLA, and had built a computing cluster with a scale of 30,000 cards. The so-called innovative VLA is the prototype of the current second-generation VLA, which eliminates the L process. But there was no sign of hope for a long time. "At that time, the several team leaders were even too embarrassed to attend the monthly and weekly meetings," and He Xiaopeng once considered giving up and focusing on the traditional VLA first.
The dawn of the innovative VLA appeared in the second quarter. He Xiaopeng received a call from the autopilot team and, at their request, personally tested the then VLA version. The result exceeded his expectations. Subsequently, he resolutely gave up the traditional VLA and fully developed the second-generation VLA.
Liu Xianming is the person in charge of the innovative VLA project. He joined XPeng in March 2024. Before that, he had been engaged in machine learning and computer vision research at Meta and Cruise, a subsidiary of General Motors. He Xiaopeng trusts him very much. They communicate very frequently, and many of their conversations can last for several hours.
In October 2025, right after the National Day holiday, XPeng announced internally that Liu Xianming would replace Li Liyun as the person in charge of the Autopilot Center organization. Liu Xianming and Li Liyun represent different technical routes in the intelligent driving field. The latter focuses on intelligent driving products, emphasizing function implementation and product implementation. Li Liyun helped XPeng implement NGP in hundreds of cities. Liu Xianming focuses on building a world base model that can deduce the physical world, helping XPeng's autonomous driving technology have stronger generalization ability and go global. This adjustment means that XPeng's autonomous driving technical route has completely transformed from function implementation to the basic model.
Even earlier, the intelligent driving organizations of Geely, NIO, and Li Auto had undergone major adjustments. Geely integrated several internal intelligent driving R & D teams into Qianli Technology and also introduced external partners as core suppliers.
On September 19th this year, Li Auto reorganized its autonomous driving R & D department into 11 secondary departments. The focus of the adjustment was to tilt R & D resources towards VLA. The original model algorithm team was split into a basic model department, a VLA model department, and a model engineering department. The heads of all 11 departments report directly to Lang Xianpeng. "This adjustment is to promote the team's evolution into an AI organization," Li Auto mentioned in an internal letter. At the same time, they also cancelled the large-scale closed R & D model used in the past.
NIO's adjustment was almost at the same time as XPeng's. Ma Ningning, the person in charge of NIO's world model, Huang Xin, the person in charge of NIO's intelligent driving products, Zheng Ke, the person in charge of NIO's intelligent driving project management department, and Wu Zhao, the person in charge of NIO's intelligent driving edge-side deployment AI engine department, and other leaders of the autonomous driving team successively left. NIO explained this by saying, "The actively adjusted organizational structure will be more conducive to NIO's full sprint for the development and delivery of the world model 2.0 version." This is already the third organizational structure adjustment of NIO's autonomous driving department within a year.
Every time the technical route switches, it means a shift in the company's resource investment and corresponding personnel adjustments. The modular development in the rule-based era is not suitable for end-to-end. Therefore, when switching from the rule-based to the end-to-end approach, the autonomous driving departments of XPeng, Li Auto, and NIO have undergone multiple rounds of organizational structure adjustments.
In August 2024, due to the switch to the end-to-end route, XPeng split its original technology development department, which was responsible for algorithm R & D covering perception, planning, control, and positioning, into three new departments: AI end-to-end, AI energy efficiency, and AI application.
As end-to-end has become a thing of the past, automobile companies have also quickly adjusted their organizations to meet the development needs of new technical routes. Taking Li Auto as an example, they once separated the end-to-end project led by Xia Zhongpu from the algorithm R & D department, and the two became parallel departments. But soon after, Li Auto determined VLA as the next-generation technical route, led by Jia Peng, the person in charge of the algorithm R & D department. Personnel and R & D resources also tilted towards the algorithm R & D department, which is considered one of the reasons for Xia Zhongpu