HomeArticle

Momenta Cao Xudong: The competition in intelligent driving will determine the pattern within two years | An interview with 36Kr.

李安琪2025-01-23 09:00
To achieve a good intelligent driving experience, an investment of tens of billions of yuan is required annually.

Written by Anqi Li

Edited by Qin Li

In the past two years, AGI (Artificial General Intelligence) has rapidly undergone a round of capital bombardment and the departure of some players, but its "fatal attraction" is still continuously capturing believers. At the beginning of 2025, Li Xiang, the CEO of Li Auto, put forward the vision that "the future vehicle will be a silicon-based family member". The high and solid boundaries between the automotive and robotics industries are gradually disintegrating.

Cao Xudong, the founder of the intelligent driving company Momenta, also believes that AGI is the ultimate dream of every AI practitioner. If the emergence of the smartphone iPhone is a thousand-meter-high huge wave, then he believes that a more disruptive opportunity than the iPhone is the general-purpose robot.

But Cao Xudong's entry time is not now. "If we do (robots), it might be something in 2027 or 2028." At present, the more intense battlefield is still the market share competition in intelligent driving.

Cao Xudong judges that high-level intelligent driving will gain momentum on a large scale this year, and the winner of the intelligent driving industry will be determined in 2026.

"This year, the urban NOA (Urban Navigation on Autopilot) function will be available in models priced at 150,000 yuan. At the end of this year and the beginning of 2026, the urban NOA will also be equipped in 100,000-level models."

This is Momenta, which has already ranked among the top echelons, and it is a moment that cannot be relaxed. After the rapid development of electrification, the competitive focus of the current Chinese auto market has tilted towards intelligence. Since last year, companies such as Huawei, Xiaopeng, and Li Auto have taken turns to attack, from the battle of "intelligent driving that can be used nationwide" to the conquest of the "end-to-end" technology.

Momenta is one of the few intelligent driving technology companies with the "end-to-end" mass production capacity.

From 2022 to 2024, the number of high-level intelligent driving models equipped with Momenta's mass production in three years is 1, 8, and over 20 respectively. "The mass production scale this year may increase several times," Cao Xudong said.

Recently, we met the founder Cao Xudong at Momenta's headquarters in Suzhou. Regarding Momenta's rapid rise in the end-to-end intelligent driving competition, he repeatedly stated, "We just did it early enough."

Cao Xudong, the founder of Momenta   Official source

"End-to-end" advocates using a large model to integrate the perception, prediction, decision-making, and control links of intelligent driving. From the past when engineers wrote rules by hand to tell the vehicle how to drive, to using AI large models, massive data, and cloud computing power to enable intelligent driving to have the ability of self-evolution.

Cao Xudong told 36Kr Auto that the company has long tried to integrate AI models into intelligent driving. For example, since 2019, Transformer has been used for prediction and path planning. In early 2023, the two-stage end-to-end was mass-produced, and in 2024, it evolved to the one-stage end-to-end solution.

In terms of engineering capabilities, Momenta has also aimed at the opportunity of mass-producing intelligent driving early. Since 2021, it has successively collaborated deeply with automakers such as IM Motors and BYD.

Cao Xudong frankly admitted the difficulty of the mass production process, "To cooperate with Chinese automakers, it takes at least 3 years to 'knock on the door', and for international automakers, it takes more than 5 years. The actual mass production time may require 10 years." For example, Momenta started to contact a certain multinational automaker in 2017, and it took 8 years from passing the supply chain access to the actual mass production and landing.

He believes that companies that enter the mass production rhythm of automakers earlier will definitely be able to obtain earlier mass production experience, obtain more data, and then iterate quickly. For example, in the mass production process, Momenta explored the data-driven model to meet the evolutionary needs of the end-to-end; designed a set of automated tool chains to adapt to the different hardware needs of different automakers.

And in terms of the mass production speed of supporting automakers, Momenta can achieve: from the beginning of cooperation to the delivery of the vehicle, it only takes three months for hardware deployment and algorithm debugging.

These have become the "original accumulation" for Momenta to establish itself as a leading intelligent driving company, but the industry competition and challenges are also becoming increasingly fierce.

Cao Xudong said in this regard that the more advanced the intelligent driving, the more difficult it is. Currently, an investment of several billion yuan a year can reach the level of the second echelon or the quasi-first echelon. But in the future, it will require tens of billions to achieve the same level. "The gap may be widening, not narrowing."

Considering that "end-to-end" is a long-term competition, Momenta is ready to invest a huge amount of resources. Cao Xudong believes that because the richness and complexity of road data in China are much higher than those in Europe and Japan. "Sometimes we joke that our golden data may be more than that of Tesla."

Cao Xudong believes that if mass-produced L4 autonomous driving is to be achieved, the annual R & D investment should be at least tens of billions or even hundreds of billions, and the majority of the expenses will be cloud computing power.

"The current bottleneck is not the entry of raw data, but the high cost of cloud computing power. There is not so much money to burn."

Under the "end-to-end" technology, Momenta is transforming the "time barrier" formed in mass-producing intelligent driving into a "resource barrier", which is an indispensable bargaining chip to stay in the final game. At the same time, Momenta is also still sprinting for the Robotaxi business.

Momenta plans to achieve fully driverless - Robotaxi (driverless taxi) in 2025. "We are different from the industry. We will reuse the sensors and domain controllers of mass-produced vehicles to make Robotaxi, and the gross profit is positive. We will not burn money to expand the scale," Cao Xudong said.

The following is a conversation between 36Kr Auto and Cao Xudong, the founder of Momenta, with slightly edited content:

"On Investment: High-Level Intelligent Driving Has Detonated, and the Subsequent R & D Investment Requires Tens of Billions"

36Kr Auto: In 2024, the penetration rate of new energy vehicles exceeded half. What do you think the penetration rate of intelligent driving will be in 2025?

Cao Xudong: The penetration rate of medium and high-level intelligent driving should be 10% - 20%.

36Kr Auto: As an upstream supplier for top automakers that require intelligent driving as a standard configuration, how much growth will your business experience?

Cao Xudong: The mass production scale may increase several times.

36Kr Auto: To what price range will high-level intelligent driving models extend?

Cao Xudong: In 2025, I think it can reach around 150,000 yuan. At the end of 2025 and the beginning of 2026, urban NOA may also be available in 100,000-level models. In addition, the BOM cost of high-level intelligent driving is rapidly decreasing, and the intelligent driving experience and safety will be improved by 10 times, 100 times, or even 1000 times. High-level intelligent driving will gradually become a standard configuration for automakers.

36Kr Auto: What is the practical data of intelligent driving that you have seen internally?

Cao Xudong: OEMs (Original Equipment Manufacturers) continuously lower high-level intelligent driving from 300,000-yuan vehicles to 200,000-yuan, 150,000-yuan, and gradually to 100,000-yuan vehicles, indicating that consumers are willing to pay, and only then are automakers willing to pay. From our back-end data, users basically use intelligent driving for 50% of the mileage.

36Kr Auto: Intelligent driving is very popular, but many intelligent driving companies have not yet made a profit. When will this situation change?

Cao Xudong: Autonomous driving has a huge R & D investment, and it will be even greater in the future. If mass-produced L4 is to be achieved, the annual R & D investment should be at least tens of billions or even hundreds of billions. Calculate that if the cost is spread over 1 million vehicles, the intelligent driving cost per vehicle is 10,000 RMB; if it is 10 million vehicles, it is about 1,000 RMB per vehicle. To balance the R & D investment, the scale should be several million vehicles.

2024 is the detonation point of high-level intelligent driving, but the scale has not really risen. To ultimately be profitable, the scale effect must be achieved. After the number of players decreases, the revenue scale can cover the R & D cost to be profitable.

36Kr Auto: When will the reshuffle in the intelligent driving industry end? What skills are needed to stay in the game?

Cao Xudong: It will probably end by the end of 2026, and the winner will be determined. To stay, there may be several aspects. First, good technology and products, behind which is a strong organizational ability and R & D system, which is the most important necessary condition.

Second, there must still be a first-mover advantage, especially in mass-produced autonomous driving. It takes three years to knock on the door of an automaker for cooperation. If this automaker does not have the opportunity to enter now, it will be very difficult to enter later. Those who are advanced will definitely be able to obtain earlier mass production experience, obtain more data, and then iterate quickly.

36Kr Auto: The annual investment of tens of billions is not small. Where does this money come from? Where is it mainly spent?

Cao Xudong: Most of it is definitely income. Our R & D proportion will be very high. We are not a hardware company, so our gross profit will also be very high. When a vehicle is sold, it is a complete software license, and the gross profit is the same as Microsoft selling Office. The R & D investment is huge, but once it is developed, the margin cost is almost 0.

It is mainly spent on personnel and cloud computing power. The more advanced the technology, the more computing power is required. In 2027 and 2028, the investment in computing power will be significantly more than that in personnel.

36Kr Auto: How much have you invested in computing power this year?

Cao Xudong: Intelligent driving just exploded in 2024, and the scale will rise in 2025. We will have an investment of several billion yuan in computing power by 2027.

"On End-to-End: The Bottleneck Is Not the Amount of Data, but Golden Data and Computing Power"

36Kr Auto: Momenta is very fast in end-to-end mass production. Have you summarized what aspects you have done right?

Cao Xudong: The fundamental reason is that we started early. Transformer came out in 2018, and we used it for deep learning prediction in 2019, and for deep learning planning in 2020. We mass-produced the two-stage end-to-end in early 2023, but there was no such term at that time.

In the first half of 2024, we developed the one-stage end-to-end. Behind this is the accumulation of talents and R & D system. End-to-end is actually using one model for camera input and then trajectory output. Why has it only become popular recently? Because the direction is correct, but the successful path may be one in a million. If there is no accumulation before, it is difficult to find the correct path.

36Kr Auto: What do you think are the challenges of doing end-to-end? Is it difficult to use rules to back up the model?

Cao Xudong: I think using rules to back up the end-to-end may be wrong. Because there are various corner cases that the end-to-end model cannot handle, that's why a backup is needed. But logically speaking, if the end-to-end model has the ability to handle corner cases, why not solve the problems within the end-to-end model?

I have always believed that the rule base and the end-to-end are mutually redundant relationships, and cannot be said to be a backup. The code is getting less and less. It is impossible to use less and less rule code to back up the end-to-end large model.

Backing up means that there are millions of various corner cases that have not been solved. It is not realistic to use rules to back up millions of long-tail problems. Otherwise, L4 could be achieved just by relying on these backup codes themselves.

36Kr Auto: Has the technological evolution of end-to-end been smooth sailing?

Cao Xudong: End-to-end is just the beginning. How does the training data come? There is a lot of data engineering work here. Many people's understanding of data-driven is not in place. They think it is just writing data and doing data, and they are not willing to do the dirty and tiring work. If you think this way, the end-to-end cannot be done well at all.

Data engineering must be regarded as something more important and more in need of systematic construction than software engineering. Just like making chips, the raw material of chips is not just sand. Just having sand does not mean you can make chips. This is a joke. The purity of the silicon used in chip raw materials needs to be 9 nines to 12 nines. Purifying the sand is an industrial system. Similarly, to give the end-to-end model better data, a complete system is also needed to support it.

36Kr Auto: Has the data-driven closed loop achieved the desired effect now?

Cao Xudong: It still needs to be improved because L4 has very high requirements for this data-driven flywheel system. L4 has long-tail problems that may occur once every 10,000 kilometers or 100,000 kilometers. How to verify this?

If you rely on your own fleet to conduct road tests, you may only run tens of thousands of kilometers in a week, and you may not encounter a single case in a week. Then after a new version is released, how do you know if the corner cases have improved or worsened. Therefore, through the shadow mode of a large number of mass-produced vehicles, data is collected for verification and closed-loop simulation.

36Kr Auto: The demand for data in end-to-end is increasing. Some intelligent driving suppliers seem to be unable to obtain the data of cooperative automakers. How do you solve this problem?

Cao Xudong: It shows that the trust relationship between the supplier and the automaker is not in place. I think the key lies in whether the intelligent driving company can create value for users and customers. There is no such thing as a free lunch.

The consensus we have reached with our automotive customers is that the vehicle end screens for corner cases (long-tail scenarios), and we identify and return the scenarios that are not well done for model training and learning, and purposefully improve the model's ability and product experience.

36Kr Auto: Can this data sharing model be reached with every automaker?

Cao Xudong: Yes.

36Kr Auto: If the automaker is not willing to share data, will it become a cooperation threshold?

Cao Xudong: This is not a threshold. We are very laid-back. If the customer is willing to share data, we will cooperate. Our data processing is standardized and automated.

If the automaker considers the data as an asset and