Intelligent Driving Circle: Everyone Awaits He Xiaopeng

The first batch of those who "deleted the database and started over" are beginning to return to the first echelon.

"If I want to leave the current competitors far behind, what should we do with this generation of intelligent driving technology?"

Two years ago, in XPeng's office in Silicon Valley, USA, when He Xiaopeng met Liu Xianming, he almost only asked this question.

This question is very crucial.

Liu Xianming's answer was to remove the "language" in the VLA. During an hour - long conversation, Liu Xianming didn't think it was an interview, nor did he need to persuade the boss to accept a new technical solution. Instead, the two of them had already started discussing the specific steps to do this.

After Liu Xianming left He Xiaopeng's office, he only had one thought: "This is a place I must come to."

And Liu Xianming is the fourth key figure in XPeng's self - developed intelligent driving technology in the past ten years.

Wu Xinzhou achieved the "generational leadership" of XPeng's intelligent driving; Li Liyun completed XPeng's transformation from the rule - based era to the end - to - end era. But at this stage, many players quickly overtook XPeng with end - to - end technology.

Obviously, XPeng didn't expect others to catch up so quickly.

When XPeng rolled out its 1 millionth vehicle, there was a voice in the outside world: "XPeng's sales have recovered from a slump, but its intelligent driving technology is being chased by 'Li Auto, Huawei and others', and is even questioned for'relying on past achievements'."

The ups and downs of XPeng's intelligent driving technology are a microcosm of the breakthrough history of intelligent driving technology among Chinese new - energy vehicle startups, all centered around the competition of three capabilities: system, mass production, and algorithm.

But at the same time, it has its own uniqueness.

XPeng changed its leaders three times in eight years, which implies another meaning:

The real intelligent driving war is not about the current technological gap, but about fighting against organizational inertia.

The times are changing, the architecture is changing, and the leading figures also need to change.

Looking around, only a few car companies dare to repeatedly revolutionize themselves in the field of intelligent driving.

This self - revolution doesn't mean overthrowing everything in the end - to - end era. Instead, it's about having the courage to start over - are you brave enough to tear down the pyramid you painstakingly built in the past? Are you brave enough to admit that everyone has their own historical mission, and the feeling of fulfilling the mission is 'eliminating yourself'.

The Beginning of the Problem: Why Does XPeng Need to Leave the 'Competitors' Behind?

In 2024, the reason why He Xiaopeng asked "how to leave the competitors behind" was that he realized earlier that XPeng's intelligent driving technology was no longer 'a cut above the rest'.

The starting point of this question comes from the intelligent driving worldview established by Wu Xinzhou and Li Liyun.

Wu Xinzhou built XPeng's intelligent driving technology at the peak of the rule - based era.

Wu Xinzhou, with a background in systems, has a much stronger overall awareness than most leaders. One year after joining XPeng, he established three complete teams for XPeng: perception, planning and control, and mapping, helping XPeng elevate its technology, team, and system from the budding stage to generational leadership.

More importantly than the team, Wu Xinzhou created a closed - loop development model for intelligent driving data, making XPeng a leader in the rule - based era.

Looking back at the three technical paths in the industry at that time:

The first type is Tesla: HydraNet + rule - based planning and control. Its core lies in that the perception module detects the environment and target objects through HydraNet.

The second type is traditional car companies: They adopt black - box solutions from suppliers such as Mobileye and Bosch, which are stable but have slow iteration speed and are not smart enough.

The third type is new - energy vehicle startups with full - stack self - development: XPeng adopted a rule - based multi - sensor fusion architecture in Xpolit3.0 and self - developed and delivered the best - experience highway NOA at that time. In the subsequent urban NOA PK competitions, it maintained a top - three position in terms of speed and quality.

The Xpolit3.0 and 3.5 versions are Wu Xinzhou's masterpieces.

But even when switching from highways to urban areas and facing different scenarios, the only way is to'rewrite every prediction and planning algorithm'.

The XPilot series versions of XPeng created by Wu Xinzhou correspond to the rule - based era of intelligent driving, mainly featuring a segmented architecture of positioning, perception, decision - making, planning, and control.

The only difference is that Wu Xinzhou was earlier aware of turning vehicle - end data into an engine for rapid algorithm iteration. XPeng is also a player with data iteration capabilities similar to Tesla's in the rule - based era.

Li Liyun, the successor, guarded XPeng on the eve of the end - to - end era.

In the pre - end - to - end era, two thresholds need to be crossed: end - to - end perception and model - based decision - making and planning.

The former disassembles the autonomous driving architecture into two major modules: perception and prediction, decision - making, and planning. The latter integrates the functions of prediction, decision - making, and planning into the same neural network.

Li Liyun led XPeng's Xpolit architecture to fully transform into XNGP +, safeguarded mass production, and completed the two stages of the 'pre - end - to - end' era.

The underlying architecture of XNGP + is a preliminary end - to - end large - scale model, consisting of the perception neural network XNet, the planning and control large - scale model XPlanner, and the AI large - language model XBrain. This is a typical example of model - based decision - making and planning.

After the intelligent driving industry entered the 'post - end - to - end era' at the end of 2024, many players turned the tables with one - stage end - to - end and VLA architectures.

In the battle of expanding into new cities, Huawei and Li Auto quickly caught up. Later, when XPeng adopted end - to - end technology, some proposed the One model. When XPeng adopted VLA, overnight, there were also many VLA users around. Everyone seemed to be in the first echelon.

Wu Xinzhou built a city, and Li Liyun guarded a city.

However, the organizational inertia around the rule - based era and the pyramid built on rules have become a burden instead.

To achieve the ultimate end - to - end, half of the city needs to be destroyed before building a new one. Success can be a curse, and XPeng is no exception.

Neither of them could answer He Xiaopeng's question.

The state of keeping pace with other players is intolerable to He Xiaopeng.

Removing the 'Language' in VLA Became the Watershed for XPeng's Intelligent Driving Technology

In the post - end - to - end era, Liu Xianming found the answer:

- "If I want to leave the current competitors far behind, what should we do with this generation of intelligent driving technology?"

- Remove the 'language' in VLA.

The VLA architecture, i.e., Vision - Language - Action, is a typical end - to - end architecture. It changes the modular processing method of autonomous driving in the rule - based era. VLA turns sensor data into language and symbols, and then forms decisions through reasoning and hands them over to the vehicle for execution.

Compared with the two - stage end - to - end architecture, it has stronger understanding ability and interpretable reasoning traces, avoiding the black - box situation.

Liu Xianming led the "second - stage revolution" of XPeng's VLA.

VLA1.0: Vision - Language - Action, which requires two language translations. First, input vision and language, then output language, and then turn the language into trajectory points (go points) or actions, and finally put them into the end - to - end model for decision - making.
VLA2.0: Vision + Language - Action, removing the "wall" of language. After inputting language and vision as information for reasoning, it directly gives actions and final results.

The most fundamental change is that the intermediate step of "translating sensor signals into language tokens" is removed. The reasoning task changes from a large - language model (LLM) to a multi - modal Transformer large - scale model.

There are two reasons for this:

Firstly, it solves the information loss in traditional VLA.

Traditional VLA needs two language translations. Turning discrete structured data into continuous signals in this process will cause a large amount of information loss in the physical world. The second - generation VLA uses more continuous signals to complete tasks, and its network structure is extremely simple.

Secondly, it solves the limited output of traditional VLA and improves the model's efficiency and generalization ability.

Language is discrete, while control signals (vehicle steering, acceleration) are continuous quantities. Traditional VLA has difficulty precisely controlling physical systems, limiting the model's performance in complex scenarios. The second - generation VLA removes the language translation link, which can simplify the training method and directly output actions in the physical world.

For example, XPeng's Super LCC can achieve roaming in the park without any navigation or text instructions. Another example is that this model also makes self - supervision possible. When XPeng promotes global autonomous driving, it can conduct generalization training without data annotation.

Interestingly, XPeng's second - generation technology has something in common with Tesla's FSD V14.

The core of FSD V14 is also a multi - modal model. It first inputs the fused information of vision, navigation map, sound, and the vehicle's own state, and then gives a joint result after reasoning, and finally makes a driving decision collaboratively.

This multi - modal signal generates Language as an intermediate representation on the one hand, and generates signals such as panoramic segmentation, 3D occupancy, and 3D Gaussian representation on the other hand, which jointly determine the output Action.

In the cloud, both XPeng and Tesla have a "world - like model", and their functions are the same.

This world model has changed from intelligently generating environmental scenarios in the past to a prediction system that can imagine and evaluate the quality of decisions.

XPeng's world model is called the world simulator. When V and L output trajectories and decisions, they will be recorded in the world model, and the world model is trained with VLA data. The main task is for the simulator to generate different driving decisions and score these different strategies.

This is consistent with the main function of Tesla's neural simulator, which verifies whether the new model is better and synthesizes low - frequency extreme scenarios.

Liu Xianming once said that they didn't know how Tesla's FSD V14 was developed. It was only after seeing Tesla's speech that they found their thinking paths were very similar.

In late December 2025, He Xiaopeng completed two cross - border evaluations of Tesla and XPeng (XPeng's second - generation VLA and FSD V14). Both of them have the "emergence" ability. Both Tesla and XPeng have the ability to "stop at a wave of the hand".

Although Tesla has led the reconstruction of intelligent driving technology several times, through actual tests, we can see that XPeng has the opportunity to overtake Tesla by upgrading its advanced technical architecture.

The Spark in Silicon Valley Is No Accident

Only Liu Xianming can solve He Xiaopeng's problem.

He Xiaopeng's encounter with Liu Xianming and this huge technological change are not accidental.

There are two inevitabilities behind this. The first one comes from the XPeng R & D center in the United States that He Xiaopeng insisted on keeping, which played a huge role in this change.

Back in 2017, it was not unusual to establish an R & D center in the United States.

At that time, there were overseas R & D centers of Chinese companies everywhere in the Bay Area. In addition to Baidu, Didi, and Pony.ai, even Great Wall and BYD tried to develop autonomous driving in the Bay Area.

These companies all hoped that the talents in the Bay Area could "install a tap" for their technology sources, so that they could have access to water whenever they needed.

Seven years later, these overseas R & D centers have either shrunk significantly or no longer exist.

The technological competition in autonomous driving is more long - term than many people think. He Xiaopeng realized earlier that the role of the R & D center in the United States is not a tap, but a spark to preserve the continuation of advanced technology.

In the past ten years, only XPeng has maintained a considerable number of R & D personnel in Silicon Valley - there is still a team of about 200 people in the Bay Area.

The Bay Area will always come in handy in a more unexpected way.

XPeng's North American R & D center in California, source: Xiaohongshu

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Everyone in the intelligent driving circle is waiting for He Xiaopeng.

The Beginning of the Problem: Why Does XPeng Need to Leave the 'Competitors' Behind?

Removing the 'Language' in VLA Became the Watershed for XPeng's Intelligent Driving Technology

The Spark in Silicon Valley Is No Accident