HomeArticle

Auto companies are suffering from the "AI illness": Spending hundreds of millions of yuan per month to compete in intelligent driving computing power.

李安琪2024-10-30 09:00
Automakers are becoming beasts that devour massive amounts of data and computing power.

Written by Li Anqi

Edited by Li Qin

"Since the second half of the year, Li Auto has almost purchased all the cards in the hands of channel distributors," an insider said.

The rush to purchase computing power sparked by the entrepreneurship of AI large-scale models has suddenly shifted to the automotive industry this year. Led by Li Auto, Huawei, and XPeng Motors, companies charging towards end-to-end intelligent driving are particularly aggressive.

Similar to AI large-scale model technology, end-to-end intelligent driving technology also has billions of parameters and is advancing towards the tens of billions. Computing power is the fuel for this data furnace. This also determines that the competition for end-to-end technology and computing power resources has become the new decisive rule for intelligent driving.

"Li Xiang (CEO of Li Auto) often asks me if the computing power resources are sufficient. If not, we'll buy more," Lang Xianpeng, Vice President of Li Auto's Intelligent Driving, once said in an exclusive interview with 36Kr. According to 36Kr's understanding, Li Auto has already stockpiled tens of thousands of computing power cards, "and is also looking for the address of the data center."

In July, Li Auto's cloud computing power was 2.4 EFLOPS. By the end of August, Li Auto's computing power had sharply increased to 5.39 EFLOPS. In almost one month, Li Auto's cloud computing power increased by nearly 3 EFLOPS.

Similarly, XPeng Motors announced that by 2025, the cloud computing power will increase from the current 2.51 EFLOPS to 10 EFLOPS.

Huawei's intelligent driving has also rapidly expanded the cloud training computing power scale from 5 EFLOPS to 7.5 EFLOPS within two months.

What does this mean? An intelligent driving industry insider told 36Kr that the training graphics cards currently used by automakers are mainly NVIDIA H100 and A800. After the US ban, most of the cards that can be circulated in the market are A800.

According to 36Kr Auto's understanding, the quotation for an A800 server (including 8 graphics cards) is about 950,000 yuan. Calculated according to the FP16 precision, the computing power of a single A800 is 320 TFLOPS. Approximately, 1 EFLOPS (equal to 1,000,000 TFLOPS) can be calculated as 3,125 A800s, that is, 390 8-card modules.

Calculated at the price of 950,000 yuan per 8-card module, the computing power of 1 EFLOPS requires approximately 370 million yuan.

That is to say, in the past more than one month, Li Auto has spent over 1 billion yuan just on stockpiling computing power chips, while XPeng needs to spend about 3.7 billion yuan to achieve the total computing power target for next year.

Although the cost is huge, automakers cannot afford to be slack. Intelligent driving technology has undergone a new paradigm revolution under the AI wave: from the rule-driven traditional solution to the AI-driven and data-driven "end-to-end" solution.

To mass-produce end-to-end intelligent driving products, automakers need to first become beasts that devour massive amounts of data and huge cloud computing power.

Tesla has become a "computing power maniac" first. In September last year, Tesla's AI training chip reserve was only about 10,000, but the digital display in this year's third-quarter earnings report shows that Tesla's current AI computing power is approximately equivalent to 67,500 NVIDIA H100 chips. The computing power reserve has more than sextupled in one year.

Source: Tesla official website

This is a rather terrifying number. Currently, Tesla's total computing power is approximately 67.5 EFLOPS. In contrast, the total global computing power scale last year was 910 EFLOPS.

However, under the feeding of massive data and the training of super computing power, Tesla's end-to-end FSD v12 version provides a smoother and more anthropomorphic intelligent driving ability than ever before. This has also lured the automotive industry into this data and computing power game.

Automakers Suffer from Data Hunger

Intelligent driving technology under the end-to-end model is a conspiracy of data and computing power.

For the data required for end-to-end intelligent driving, Tesla has given some judgment criteria: An end-to-end autonomous driving training requires at least 1 million diverse and high-quality clips (video clips) to work properly. After reaching 10 million cases, the system's ability will become incredible.

An industry insider told 36Kr that generally, one clip is about 15 - 30 seconds, and there is no absolutely fixed time length.

Tesla has a rather obvious data advantage. Currently, Tesla has sold 7 million vehicles globally. Even if the effective data vehicles are one million, if each vehicle contributes one clip per day, then Tesla can have one million clips for training every day.

Some industry insiders hypothesized to 36Kr that if an 8-billion-parameter model is trained in the cloud, at least 10,000 hours of training data needs to be input into this model's "alchemy furnace", and the data needs to be updated every two weeks.

The earlier an automaker establishes a data-driven intelligent driving closed loop, the thicker the technical and product barriers of the automaker will be, and the more opportunities it will have to keep latecomers out.

Li Auto stated that by the earliest next year, it will launch an end-to-end + VLM that is probably trained through more than 10 million clips. Some time ago, Li Liyun, the person in charge of XPeng's intelligent driving, also publicly stated that the training data volume of XPeng's end-to-end model has reached 20 million clips.

But high-quality data is not easy to find. Musk once said that the capture of effective user intervention behaviors (high-value training data) is becoming increasingly difficult. "For every 10,000 miles driven, only 1 mile is useful for training the FSD neural network."

Li Auto also stated that currently, there are more than 800,000 vehicle owners, but only 3% of users can truly provide high-quality data.

Several intelligent driving industry insiders told 36Kr that currently, automakers and intelligent driving companies mainly have two ways to obtain data.

One is to mine from mass-produced vehicles. For example, for the hundreds of thousands of vehicles sold by automakers, engineers will write specific rules. If the user's driving behavior meets the conditions, specific data (after desensitization processing) will be uploaded. Automaker users can also actively upload some special cases.

While intelligent driving suppliers may not have an advantage in the data return of mass-produced vehicles, they often form a fleet of high-quality drivers within the company to specifically collect data by driving on the road.

The data return itself is a considerable cost. According to 36Kr Auto's understanding, a leading intelligent driving supplier company spends an annual data traffic fee in the unit of 100 million yuan. If it is a new car-making company, this expense will be even higher.

The second is to mine data from existing data. In the early stage when intelligent driving was not yet mature, automakers and intelligent driving companies often accumulated a large amount of data, many of which were invalid. Engineers could only extract data through some algorithm rules.

High-quality data, as nourishment, will determine the quality of the iteration of the intelligent driving system. This continuously tests the intelligent driving automation closed-loop ability of automakers: from data collection, data cleaning, annotation, training, simulation verification, release, bug fixing, and then going through a new round of closed loop.

And every step of data flow behind this is devouring computing power resources. Automakers and intelligent driving technology companies seem to have no way back.

Intelligent Driving to Make Money, "End-to-End" Must Be Done Despite Difficulties

The benefits brought by end-to-end intelligent driving are within reach.

After launching the end-to-end FSD at the end of 2023, Musk once sent an email urging front-line sales to let more users experience the anthropomorphic ability of intelligent driving because the FSD experience has become better.

This year, Tesla has even adopted methods such as free limited-time use for all employees (in North America), reducing the subscription fee from $199 per month to $99 per month, and reducing the buyout fee from $12,000 to $4,500 to increase the penetration rate of FSD. Tesla also stated that FSD will be launched in China in the first quarter of next year. This will have another commercial imagination space.

In other words, "end-to-end" makes intelligent driving closer to commercialization than ever before.

In China, "end-to-end" is also accelerating the commercialization process of intelligent driving.

Huawei was the first to taste the sweetness of intelligent driving commercialization. At the end of last year, Huawei's AITO M7, in cooperation with Seres, received 100,000 orders within more than two months after its launch, and more than 60% of users chose the intelligent driving version.

In addition to launching intelligent driving version models, Huawei also further charges through the intelligent driving software package. Currently, most automakers' intelligent driving software is freely open to users.

Unlike Tesla's price reduction promotion, Huawei's intelligent driving software fees are gradually increasing. A Hongmeng salesperson told 36Kr Auto that the purchase price of Huawei's Intelligent Driving System (ADS) 1.0 stage is 3,000 yuan, ADS 2.0 stage is 6,000 yuan, and ADS 3.0 is 10,000 yuan. "The subsequent price will still increase."

The transition from ADS 1.0 to 2.0 to 3.0 is precisely the improvement in technology and product experience brought about by Huawei's gradual shift from the traditional multi-module intelligent driving to the end-to-end intelligent driving.

Another player that has tasted the dividend of intelligent driving technology is Li Auto. Under the premise that the extended-range + family car product power is sufficient to impress users, Li Auto has been vigorously catching up to make up for the shortcomings of intelligent driving since this year. Its end-to-end intelligent driving version has been fully pushed to all MAX version models, and the reputation of intelligent driving has recovered.

In the earnings call for the second quarter of this year, Li Auto stated that the order proportion of its AD Max (intelligent driving version) models priced above 300,000 yuan is close to 70%. And the AD Max version is 20,000 yuan more expensive than the AD Pro version. Users paying for a more expensive model are actually paying for intelligent driving.

American writer Philip K. Dick once described in the novel "Do Androids Dream of Electric Sheep?" that androids have emotions, dreams, and hope to own a live pet.

With the support of end-to-end technology, the intelligent driving system may have begun to "dream" of an electric sheep. However, the maintenance of this electronic dream requires a large amount of resources, and automakers and intelligent driving companies have thus suffered from data and computing power hunger.

Computing Power Game, Buying Cards and Building Computer Rooms
 

In addition to selling cars to obtain more data nourishment, the intelligent driving teams of automakers are also preparing chip computing power resources.

The data from Tesla's third-quarter earnings call shows that Tesla's current AI computing power is approximately equivalent to 67,500 NVIDIA H100 chips, and the total computing power is approximately 67.5 EFLOPS.

Tesla stated that by the end of October, Tesla will add another 21,000 H100s. It can be roughly estimated that Tesla's total computing power will reach 88.5 EFLOPS at that time.

In addition to frantically purchasing NVIDIA graphics cards, Tesla's self-developed chips are also on the way. Musk previously posted on X that by the end of the year, its supercomputer Dojo 1 will have approximately 8,000 H100 GPUs to provide equivalent training capabilities. Tesla previously expected that after Dojo enters production, the total scale of its computing power cluster can reach 100 EFLOPS.

The endless computing power reserve makes domestic automaker players dare not easily fall behind.

However, after the chip sales restriction, NVIDIA's high-end AI chip H100 is difficult to circulate in China. What domestic enterprises can more easily buy is the special supply version chip A800, etc. launched by NVIDIA for the Chinese market, whose performance and price are not as good as H100.

Currently, Huawei's intelligent driving is the player with the highest computing power reserve in China, reaching 7.5 EFLOPS. A Huawei insider told 36Kr that the internal not only uses NVIDIA's training chips but also uses Huawei's self-developed Ascend chips, and the two are used in combination. Although the Ascend toolchain is not particularly easy to use, due to its self-developed nature, the supply is sufficient, and Huawei can make rapid progress in cloud computing power.

Li Auto ranks after Huawei with 5.39 EFLOPS. Behind this is a reserve of about tens of thousands of NVIDIA graphics cards.

An industry insider calculated a sum for 36Kr: Taking the A800 chip as an example, calculated according to the FP16 precision generally applicable for deep learning training, the computing power of a single A800 is 320 TFLOPS. Then, to achieve 5.39 EFLOPS computing power, more than 16,800 A800s are required. (Li Auto is not entirely A800, this is only a rough calculation, 1EFLOPS = 1000PFLOPS = 1000000TFLOPS)

An industry insider told 36Kr that after the rush to purchase computing power by AI large-scale model companies subsided this year, cloud training graphics cards are relatively easier to buy. Last year, the 8-card module of A800 could easily sell for over one million yuan, and now it has dropped to about 950,000 yuan. Even so, for domestic automaker players to stockpile computing power is still a huge investment.

Li Auto's goal is to reach 8 EFLOPS by the end of the year. According to 36Kr's understanding, Li Auto previously jointly established a data center with the cloud service provider Volcano Engine, but is still preparing the location of a new data center.

The computing power of XPeng's intelligent driving center is 2.51 EFLOPS, which can be equivalently converted to more than 7,800 A800s. XPeng's goal is to have a computing power of more than 10 EFLOPS by 2025. NIO's current cloud computing power is 1.4 EFLOPS, which can be equivalently converted to more than 4,300 A800s.

For comparison, according to the information of the Ministry of Industry and Information Technology, as of June 2024, the domestic computing power scale has reached 246 EFLOPS (based on FP32 calculation). If converted to FP16, it is 492 EFLOPS. And the total cloud computing power of Huawei, NIO, Li Auto, and XPeng accounts for about 3.5% of the national computing power scale.

But end-to-end is not only a game for giants; small and medium-sized players are also squeezing into the arena. Intelligent driving suppliers often jointly with automakers to quickly enter the battlefield, such as the combinations of IM Motors and Momenta, Great Wall and DeepRoute.ai.

According to 36Kr Auto's understanding, the training chips of some leading intelligent driving suppliers have also reached the level of thousands. For example, Momenta and Horizon reached a cooperation with Volcano Engine last year, with orders in the hundreds of millions.

In the past two years, the world has fallen into a crazy state of AI large-scale models. The entry ticket for the entrepreneurship of domestic AI large-scale model companies is as high as 50 million US dollars, and the current valuation of the highest-valued large-scale model company "The Dark Side of the Moon" has reached 23.6 billion yuan.

Currently, domestic leading AI large-scale model companies are developing towards trillion-parameter models, which also require a huge computing power pool to support. Step by step, large-scale model companies such as StepStar and Kimi are building 10,000-card cluster training through cooperation with cloud service providers.