He Xiaopeng: Everyone is crossing the river by feeling the stones on the path of large models | Exclusive interview by 36Kr
Text by | Li Anqi
Edited by | Li Qin
Before June, the Chinese car model with the highest AI computing power for assisted driving was the executive-class sedan ET9 of NIO - with a computing power exceeding 2000 Tops and a starting price of 788,000 yuan. However, XPeng Motors broke the situation with a new car.
On the evening of June 11th, XPeng unveiled its latest SUV model, the G7, in Guangzhou. At the press conference, CEO He Xiaopeng spent more than half of the time introducing the new car's assisted driving chip, "Turing". During the post - conference communication session, most of the media questions and He Xiaopeng's answers revolved around the chip.
The new G7 is equipped with three self - developed Turing AI chips by XPeng. He Xiaopeng said that the effective computing power of the G7 is equivalent to that of nine Orin - X chips, "equivalent to an effective computing power of over 2200 Tops, which is 3 - 28 times that of other chips in the industry."
In the industry, the most mainstream solution currently is to use two NVIDIA OrinX chips, with a computing power of 508 Tops. Even the latest generation of NVIDIA's in - vehicle AI chip, ThorU, which will be installed in the new cars of Xiaomi and Li Auto this year, has a computing power of around 700 Tops.
XPeng's purpose in doing this is to secure a computing power safety period for the next five years. He Xiaopeng said at the press conference, "Many analyst reports say that in the future, hundreds of Tops, 1000 Tops, or 4000 Tops of computing power will be needed, which may be achieved in the next 2 - 3 years, and the latest estimate is by 2030, that is, five years later. XPeng has achieved this in advance."
He believes that high computing power will be the basic starting point for L3 and even L4 autonomous driving. Therefore, the G7 is positioned by XPeng as "the first AI car with L3 - level computing power", with a pre - sale price starting from 235,800 yuan.
He Xiaopeng said, "L3 - level computing power" and "AI capabilities" are the first steps towards L3 - level cars, representing that the car already has the intelligent capabilities and levels of L3. However, only after hardware dual redundancy and legal and regulatory certification can it be regarded as a real L3 - level car.
However, XPeng currently has no plan to standardize the Turing AI chip. The G7 Max model still uses the two - NVIDIA Orin X chip solution.
In the higher - end version, the G7 Ultra, XPeng is equipped with three Turing AI chips. Two of them are used in the field of assisted driving, working with the locally deployed VLA - OL (Vision - Language - Action) model in the car. "The upper limit of the assisted driving ability is more than 10 times higher than that of the industry's Max models."
The other one is used for the intelligent cockpit, running the VLM (Vision - Language Large Model). He Xiaopeng believes that one of the standards for an L3 - level computing power car is the local deployment of the VLM and VLA large models.
These are two popular multimodal large models at present. The VLM model emphasizes more on the understanding of images/videos and texts, belonging to the tasks of perception and cognition; while the VLA, on the basis of the VLM, adds the "action" ability to interact with the physical world, such as generating control signals for the assisted driving system.
Previously, according to 36Kr, one of XPeng's Turing AI chips can handle a large model with a maximum of 30B (Billion, that is, 30 billion) parameters. If the two chips for assisted driving are combined, XPeng's VLA model will indeed have more room to play.
"High computing power can significantly increase the upper limit of AI capabilities and also greatly improve the lower limit of AI capabilities, especially the lower limit related to safety," He Xiaopeng said at the press conference.
He gave an example. When driving on a straight road, the driving performance of the VLA and the "end - to - end" solution may be similar. But "what's the difference when there is more computing power? With low computing power, the (model running frame rate) may be 3 or 5 frames per second, and at most no more than 10 frames. With high computing power, it can be 10, 20, 30 frames per second or even faster. High computing power can reduce the delay. What does this mean? Although the driving effect on a straight road seems the same, the safety levels are completely different."
XPeng calls the model based on the Turing chip the AVL - OL model. On the basis of the "behavioral cerebellum" of the end - to - end system, a "motor - type brain" is added. Through internal reinforcement learning training at XPeng, "the upper limit of the intelligent driving ability is more than 10 times higher than that of the industry's Max - solution models."
For example, in situations such as ambulance recognition, the optimal timing for lane - changing, and even road collapses and potholes, He Xiaopeng said that the VLA model performs better.
However, he also emphasized that these scenarios are still on the schedule and cannot be realized immediately. In other words, the chip hardware is in place first, but the experience and function exploration are not fully ready.
At the communication meeting, He Xiaopeng told 36Kr Auto, "On the path of large models, everyone is groping forward."
The VLA solution seems to be becoming the choice of the first - tier players in domestic assisted driving. Previously, Li Auto entered the development of this solution. However, this is also the divergence point between domestic players and Tesla's FSD solution. From the public information, Tesla is still deeply involved in the "end - to - end" solution and is not concerned with multimodal large models.
But consistent with Tesla is the demand for high computing power at the vehicle end. Tesla has planned the next - generation autonomous driving hardware AI5. Some industry institutions speculate that the computing power of AI5 is between 3000 Tops and 7200 Tops. Coupled with Tesla's software - hardware integration ability, the next - generation assisted driving solution may be closer to autonomous driving.
This is also the goal that XPeng hopes to achieve. After software - hardware integration, it will bring more efficient capabilities. "Through compilation optimization, our capabilities can be further enhanced. In a year and a half, we can make one chip equivalent to four instead of three. That would be amazing. We are continuing to increase the possibility."
However, the technological route in the current assisted driving industry is changing rapidly, and new solutions appear almost every year. According to He Xiaopeng, the R & D of the Turing chip had a key module overturned in 2022, and the real architecture was finalized in 2022.
If it is to support a five - year safety period, it means that the Turing chip team has to consider the technological route changes in the next eight years in advance, which will be a challenge to the chip's memory bandwidth and so on. Even NVIDIA has encountered "big problems" such as heat dissipation design and low yield rate in the mass production of its latest - generation chip Thor. As a newcomer in chip self - development, XPeng's trials are just beginning.
In addition to the Turing AI chip, He Xiaopeng also introduced other highlight configurations of the G7. For example, a single 702 - km ultra - long - range option, a space that can hold 37 20 - inch suitcases, an AR HUD the same as Huawei's, and an AI Eagle - Eye vision solution.
With extremely simplified SKUs, inheriting the advantages of internal best - selling models, collaborating with Huawei's HUD, and its own super - high - computing - power chip, XPeng's expectations for the G7 are obvious: not only to fill the price gap between the G6 and G9, but also to expect high sales volume and high unit price.
However, in the pure - electric SUV market in the 200,000 - 250,000 - yuan range in the second half of the year, XPeng G7 will also encounter models such as the Xiaomi YU7 and Li Auto i6. Before the battle begins, the heat has already risen.
Generally speaking, XPeng has almost "bet" all its exploration of the future trend of large models in assisted driving on the Turing AI chip. In turn, the mass production and delivery of the Turing chip also support XPeng's "ambitions" such as L3 and VLA large models. Intertwined with each other, XPeng also needs to proceed step by step in the "uncharted territory".
The following is a conversation between 36Kr and others with XPeng Motors CEO He Xiaopeng and XPeng G7 product manager Nick, edited:
Question: Why can XPeng's self - developed chip have higher effective computing power than general - purpose chips? Some companies encounter problems such as slow inference speed, low accuracy, and serious heat generation when using self - developed chips to run algorithms. Has XPeng encountered similar problems? How to solve them?
He Xiaopeng: To be honest, our chip development was very bumpy at the beginning. We've been very lucky in the past year. From tape - out to application, everything has gone smoothly. But we've also faced many challenges. Recently, we've spent a lot of manpower on tools and compilers.
Through compilation optimization, our capabilities can be further enhanced. In a year and a half, we can make one chip equivalent to four instead of three. That would be amazing. We are continuing to increase the possibility. If, after real large - scale mass production, we find problems in memory usage, reliability, heat generation, etc., it will be of great value to us.
Question: At this stage, how can users perceive the VLA and VLM capabilities brought by the Turing chip and choose the Ultra version more?
He Xiaopeng: Computing power is the foundation. We are developing many interesting functions. There may not be many functions in the initial stage of the G7, but through OTA updates later, new functions will be added every month. I hope that some significant functions will be launched this year.
We've only been developing for a little over a year. The most important thing is to run the large model well on the new computing platform, which is the foundation. Then we need to optimize performance, stability, and effectiveness, fill in a lot of data for training, and build basic capabilities. Only by combining different basic capabilities can we create a valuable scenario.
We've completed the first three steps and are now turning it into interesting scenarios. We've equipped such high computing power and large memory so that we can continue to progress in the next year or two.
Question: Three Turing chips are used, one for the cockpit and two for intelligent driving. How do the two chips collaborate? Does one run the VLM and the other run the VLA? How are the chips connected?
He Xiaopeng: This involves the collaboration between cores and between chips. Between the chips, we use PCIe. Our memory runs on four computing units, including a Qualcomm 8295. In the future, running larger models will require greater memory bandwidth, which is indeed a challenge.
This is also what our next - generation EEA architecture is working on. This can be solved to some extent through self - developed chips, such as D2D (communication between two independently packaged chips) and next - generation new technologies. The second part is the design in the electronic and electrical architecture. We are exploring some interesting capabilities, such as the possibility of combining two or more Turing chips to run a huge VLA model. We'll share the results with you after we succeed.
Question: The 2000 Tops of computing power is the threshold for an L3 - level car. How did XPeng internally determine this standard?
He Xiaopeng: Currently, the high - end computing power for L2 in the industry is basically 500 Tops or 700 Tops. To some extent, the difference is not significant. I believe that only by increasing the computing power several times can there be a probability of achieving a several - fold improvement in the model. We've seen that some competitors will offer cars with 2000 Tops or even 4000 Tops of computing power. This will be the basic starting point for L3 and even some companies think it's for L4 autonomous driving.
Why not use more? If there are 50 chips with 100 Tops of computing power each, that's 5000 Tops. But can it run a large model? No. This is related to chip technology and memory bandwidth. It's a complex issue. So, as of now, this is the optimal solution.
Question: The industry believes that VLA is an upgraded version of VLM. Why does XPeng deploy both models locally at the same time? How is the computing power of the Turing chip allocated when the models are used in both the intelligent driving end and the cockpit end?
He Xiaopeng: VLA is actually the "motor - type brain" and the "behavioral cerebellum", while VLM is the "brain" of the whole vehicle, which is the interface for the vehicle to interact with people. VLA is a faster model, with a frame rate of up to 20 or 30 frames per second, while VLM has 2 frames per second or 3 frames every two seconds. VLM can run larger but slower models. They are related through data, but there won't be a complete end - to - end data chain between the two models, and it's not necessary.
In the future, there may be a vehicle with extremely high computing power and a stronger model, and perhaps one model will be enough. But for now, it's not possible. So, we still need to divide the functions, with some for movement, some for memory, some for dialogue, and some for thinking, just like the division of the brain.
Question: When will the Turing chip become the standard configuration for XPeng models?
He Xiaopeng: XPeng will continue to use NVIDIA. NVIDIA is our good partner and a benchmark for us to learn from. We may choose the Turing chip or NVIDIA chips in different configurations at different times. This is a process, and there is no one - size - fits - all approach.
Question: The computing power of Tesla's latest chip hardware platform AI5 is about 3000 - 7200 Tops. Has XPeng studied where Tesla uses its computing power? Tesla doesn't use the VLA large model. Is this a major difference between you and Tesla?
He Xiaopeng: The computing power figures I've heard are different. Maybe we'll have the answer next year, but I believe it will be a very large number.
I think that L4 really requires more computing power. Tesla hasn't said what kind of model it uses, etc. But I think it doesn't matter. Today, the leading companies in China's intelligent assisted driving may not really care. Because we have a very clear path.
On the path of large models, everyone is groping forward. The consensus for innovation is to dare to blaze our own trail. We and a few other competitors are doing this.
Question: XPeng has also promised to pay suppliers within a 60 - day payment term. What impact will this have on XPeng? Will XPeng really pay in cash or use commercial acceptance bills?
He Xiaopeng: Our payment term was relatively long in the first half of last year. It was significantly shortened in the second half of last year and also in the first quarter of this year. The industry is moving in the same direction as us. Internally, we say that, first, the supply chain is very happy; second, we originally planned to achieve this goal when we turn profitable in the fourth quarter. The industry is improving, and we're seeing if we can accelerate the realization of this goal earlier. Now, 17 car companies have made similar announcements, which is obviously beneficial to the whole industry. I hope that in the future, everyone will not only make the commitment but also achieve it quickly.
Question: Can you help consumers fundamentally distinguish between the XPeng G6, G7, and G9 to help them make better choices?
Nick: The G6 is a model for middle - class families, emphasizing the combination of practicality and technology; the G7 is a product for the middle - class, young users, and young families. It can meet the needs of individuals, couples, and is also very comfortable for a family of three. At the same time, the G7 is very strong in terms of household space and technology.
The G9 is our high - end model, with standard air suspension. It has higher requirements for overall performance and experience. Middle - class users who pursue higher - quality products can consider the G9.