Lose and run naked. He Xiaopeng makes a bet to catch up with Tesla FSD by next August.
Really getting anxious! He Xiaopeng is putting pressure on the self-driving team.
He Xiaopeng bets to catch up with FSD next year, or the head of autonomous driving will run naked
On December 11th, He Xiaopeng posted on his WeChat Moments, saying that he test-drove and compared Tesla's latest FSD V14.2 version and Robotaxi in Silicon Valley yesterday. He believes that Tesla has now entered the quasi-L4 stage. Although there are still some flaws, it has far exceeded last year's level.
Meanwhile, he also mentioned that due to time constraints, the first version of XPeng's second-generation VLA cannot yet achieve all the capabilities of the current FSD V14.2.
But he made a bet with the self-driving team: If by August 30, 2026, XPeng's VLA successfully achieves the same effect in China as Tesla's FSD V14.2 in Silicon Valley, he will open a Chinese-style canteen in Silicon Valley;
Otherwise, if it fails to reach that level, Liu Xianming, the head of XPeng's Autonomous Driving Center, promises to run naked on the Golden Gate Bridge.
The main bridge of the Golden Gate Bridge is 1,967 meters long. There should be quite a few onlookers.
XPeng Motors has announced that the second-generation VLA will be officially launched in the first quarter of 2026 and plans to push it to all Ultra models.
The time set for the bet is almost exactly five months after the release of the second-generation VLA, which is the crucial optimization period from the initial version to the mature version, in line with He Xiaopeng's own words of "steadily moving forward for long-term success."
However, this bet seems a bit unequal. If they lose, the employee has to run naked. Liu Xianming: It seems only I'm getting hurt here.
Liu Xianming joined XPeng in March last year and took over as the top person in charge of autonomous driving in October this year. He has been with XPeng for less than two years. Maybe by next year, he might have already left the company (just joking).
Come on, we're all adults. Can't we have both? What if we want to see both scenarios?
He Xiaopeng: The second-generation VLA can reach the upper limit of L4
On December 11th, the same day as the bet, He Xiaopeng was interviewed by "China Entrepreneur" magazine in Guangzhou. He said that during the tests of the second-generation VLA in recent months, he felt for the first time:
"Previously, our logic was to achieve a good L2 level or an L2 level infinitely close to L3. But now I see the possibility of reaching the upper limit of L4. Even if given 3 - 5 more years, we might even reach L5."
Why is He Xiaopeng so confident about the second-generation VLA?
VLA was first proposed by Google DeepMind in 2023. It refers to integrating the three major capabilities of vision, language, and action into the architecture. It doesn't break tasks into multiple steps but completes the whole process from seeing instructions to executing actions in one go.
Moreover, it can not only understand the environment but also think and explain its decisions like a human being.
Currently, in addition to XPeng Motors, other automakers such as Li Auto and Great Wall Motors are also deploying VLA.
It should be noted that this VLA solution still requires a two-step conversion process: first converting vision into language and then converting language into action.
Therefore, XPeng is thinking about whether it can skip the language step and directly execute tasks from vision.
At XPeng's Technology Day in November this year, XPeng officially launched the second-generation VLA, eliminating the language translation step and achieving end-to-end direct generation from visual signals to action instructions for the first time.
The model was trained using nearly 100 million video clips without manual annotation. This is equivalent to the sum of extreme scenarios that a human driver would encounter in 65,000 years of driving, which means driving continuously from the era of primitive humans to now.
To develop this model, XPeng used Alibaba Cloud's cloud computing power cluster of 30,000 graphics processing units and deployed a base large model with 72 billion parameters, completing a full-link iteration every five days.
According to He Xiaopeng, there will be 50,000 graphics processing units or even more next year, so there's no need to worry about cloud computing power at all.
Some time ago, Wu Yongming, the CEO of Alibaba, personally visited XPeng's headquarters in Guangzhou to meet with He Xiaopeng.
Meanwhile, XPeng's self-developed Turing AI chip has a single-chip computing power of 750 TOPS. A vehicle is equipped with a cluster of three Turing chips, with a total computing power of 2,250 TOPS, which is 4.4 times that of the industry's mainstream dual-Orin-X solution (508 TOPS).
XPeng is still exploring the generalization problem
So, what capabilities does XPeng's second-generation VLA still lack to catch up with Tesla's latest FSD V14.2?
One of them is the generalization problem. He Xiaopeng mentioned that in China, for example, a yellow traffic light means you should slow down. But in the traffic rules of some countries, a yellow light doesn't mean accelerating or decelerating; it means moving forward at a constant speed.
He frankly said that how to balance and achieve good generalization is a problem that XPeng is currently exploring.
Since Tesla's FSD started global deployment early, with over 6 million test vehicles worldwide, it generates 1.6 billion frames of image data every day, and the cumulative driving mileage has exceeded 9.6 billion kilometers.
Its "Shadow Mode" (an autonomous driving data collection and learning system) can collect data in diverse global traffic environments, including traffic rule areas in different countries and regions such as Europe, North America, and Asia.
In actual tests, on a 20-kilometer complex small road, the Tesla FSD V13.2.9 version needed to be taken over five times, while XPeng's second-generation VLA only needed to be taken over once.
Vehicles equipped with XPeng's second-generation VLA can also recognize traffic police gestures without prior training, understand traffic lights and react in advance, and even drive safely on a rainy night.
However, the latest FSD V14.2 version has not only significantly improved the system performance but also solved more than 95% of the hesitant lane-changing and abnormal braking problems in the V13.2.9 version. At the same time, it has introduced a number of innovative features, making the autonomous driving experience closer to that of human driving.
Of course, XPeng's second-generation VLA hasn't been launched yet, so it's still unclear what the actual experience will be like.
I've noted down this bet. Next year, we must see who wins.
This article is from the WeChat official account "Technology Daily Push" (ID: apptoday). The author is Zhao Zhishan. It is published by 36Kr with authorization.