Catching Up with FSD V14: What Lessons Is Li Auto Making Up For?

Why are more and more automakers developing chips in-house?

In the past few years, the focus of competition in the intelligent driving industry has undergone several obvious changes.

Initially, the competition was about hardware: whether to use lidar, how many cameras to install, and how much computing power (in TOPS) to achieve. Subsequently, with the arrival of the large - model era, the competition shifted to end - to - end, VLA (Vision - Language - Action), World Model and other routes.

Today, more and more companies have found that simply having a larger model is no longer sufficient to form a generational advantage. What truly determines the upper limit has begun to be whether a continuous iterative closed - loop can be formed among the model, data, computing power, and chips.

This is why more and more car companies are starting to engage in self - research.

Tesla has almost covered all aspects from data collection, training infrastructure, FSD model to Dojo and self - developed chips. In China, whether it's XPeng, NIO, or Li Auto, they are all continuously extending to more fundamental levels.

In the L8 and L9 models released this year, Li Auto has already used its self - developed Maher M100 chip. This chip, which adopts a data - flow architecture, is regarded by Li Auto as a major technological direction for AI. On the Maher M100, Li Auto has also run its self - developed Maher VLA model.

However, for the industry, the more worthy question is not "whether there is self - research", but what problems these investments can actually solve.

With this question in mind, we had a conversation with Zhan Kun, the person in charge of Li Auto's autonomous driving, and Xie Yan, the person in charge of chips. They talked about Li Auto's judgment on the next - generation autonomous driving technology route and explained the design logic behind self - developed chips, data systems, and AI infrastructure. The following is a partial transcript of the interview, edited:

Question: What work does Li Auto still need to do to achieve the effect of Tesla's FSD V14 in the fourth quarter?

Zhan Kun: I think there are two levels to catch up with FSD.

The first is the basic experience, specifically in three aspects: whether the sense of security, efficiency, and comfort can reach the same level as FSD. FSD has a very high sense of security, good efficiency, and good comfort. These are its basic skills. I don't necessarily need to drive on difficult roads, but these basic skills can reach that level.

The second is the capabilities, which are also very difficult to catch up with. For example, Tesla can give way to special vehicles, has high - precision perception in extremely narrow passages, and can recognize traffic police commands. These capabilities are very strong.

There is an opportunity for architectural upgrading in capabilities. Why do only Tesla have these capabilities while others don't? It may be that previous paradigms have restricted these capabilities, due to architectural and data reasons. We have made a lot of attempts at this level.

Question: I understand that Maher VLA is a technical system rather than a single model. For example, Mind - Edge is a model on the edge side for the intelligent cockpit. Is there still an "L" (Language) part in the current intelligent driving model?

Zhan Kun: Currently, there is a common trend in the architecture of autonomous driving, which is to integrate VLA (Vision - Language - Action model) and World Model.

In the long run, no one will not go in this direction. Whether it's VLA or World Model, the Prompt inside needs to use Language. So there must be Language, it's just a matter of how to use it.

In terms of machine intelligence, I think Vision Based is more reasonable. It is more reasonable for spatial understanding, 3D spatial perception, and environmental services. Language is definitely useful and valuable for understanding the environment, traffic, instructions, and complex thinking and decision - making.

In the long run, a fundamental model based on Vision and Language natively may be the long - term future trend.

Xie Yan: If you want to move towards L3 and L4 and solve more generalized problems, your model needs to have the same thinking ability as humans. The importance of language will become more prominent, which is also the reason why a large amount of computing power is needed in the future.

If it only has Vision and Action and has a lot of data, when it encounters a situation outside the distribution, it won't know what to do. Even if an animal has learned all common situations, it won't be able to handle a situation it has never seen before and won't know which choice is correct.

We believe that as we move towards L3 and L4, the problems we need to solve are getting closer to those after 90%, 95%, and 98% - those problems you've never seen before. The model needs to have the ability to think like a human. And the source of the ability to reason and think like a human is the language model. For example, when a police officer is making a gesture, you need to understand whether he is allowing you to go or not. This is not a problem that can be solved by collecting or generating data.

Question: With the leap in the scale of Li Auto's vehicle fleet, from the perspective of Li Auto internally, has the marginal effect of data declined? How do we define valuable data?

Zhan Kun: First, the amount of data needs to be large enough. The essence is to collect more Corner Cases (long - tail scenarios). Now, many people have many methods to create good neural triggers on the vehicle side to judge whether a scenario is a difficult or simple one, and then transmit these key data back. This is also one of the important reasons why Tesla is so strong now.

Second, the quality needs to be high, mainly referring to high - quality behavior. Now, people are gradually converging to the end - to - end paradigm. Whether it's VLA (Vision - Language - Action model), World Model, or Vision - Action model, it doesn't matter, but you must know the Action behavior. At this time, the quality of behavior is very important, and the cleanliness and consistency of behavior are crucial.

As for whether the marginal effect has declined after the data scale has increased, first of all, as long as the model improves and we strive for a perfect score, it must follow a "logarithmic curve" and gradually decline. It's impossible to grow linearly. No company doing AI can achieve that. Although it's true that the role of data convergence becomes slower as we go further, we also hope to increase its speed through scale.

Question: The Maher M100 can run in different AI scenarios. Five years later or two generations of products later, is it possible that all the computing centers in Li Auto's vehicles will use the self - developed Maher chips?

Xie Yan: Although there is a saying in the industry called "cockpit - driving integration", we believe that the most core part of cockpit - driving integration is the AI computing power part. Whether other parts are integrated is not that crucial. Because the cockpit system and the AI intelligent driving system can be completely independent, but the AI computing power can be concentrated together, which will greatly improve the allocation efficiency.

The final form of our roadmap is an in - vehicle AI computing center where all AI tasks can be calculated. Just like running OpenClaw on a laptop, the AI calculation is not on the laptop but on the Token Provider Server. It's similar in the car, with a Token Server.

The advantages of this Token Server are: First, the efficiency is very high. Second, it can isolate different tasks from each other without mutual influence. For example, the determinacy of intelligent driving tasks - whether it's memory or bandwidth - can be guaranteed not to be interfered with by other tasks. This can only be achieved through the joint design of software and hardware.

Question: Is it because the M100 is an AI inference chip with a data - flow architecture that it has a lower demand for bandwidth and a higher demand for on - chip storage compared to the autonomous driving chips of other competitors?

Xie Yan: Our requirement for bandwidth is lower, but this is not the direct reason for designing the SRAM capacity (not video memory). Now, HBM (High - Bandwidth Memory) is very popular, and many people think the higher the bandwidth, the better. Computing, bandwidth, SRAM, etc. all require transistor costs to achieve. The final design is a choice after comprehensive design considerations in terms of cost, comprehensive performance, and other aspects.

It's neither reasonable nor professional to simply compare different architectural designs based on one or two indicators. It's like a boxing match. Taller people have their advantages, and heavier people have their advantages, but the outcome is not determined by a single indicator. What matters in the end is the result of the boxing sport.

Question: Why do the current high - computing - power chip solutions, such as those self - developed by NVIDIA, XPeng, and Li Auto, not implement chip - level cockpit - driving integration, while Qualcomm has done this on low - computing - power chips? Why is that?

Xie Yan: In essence, the cockpit and driving are two independent systems. Especially for high - end L3 moving towards L4, intelligent driving requires a system with higher determinacy. The memory and computing resources are exclusive. At this time, the significance of integration is much smaller. Because resources cannot be switched in real - time, and real - time switching will reduce determinacy. If it becomes more and more exclusive, the value of integration is not great - you are just putting the chips together, but the resources are still two sets, which will not bring cost reduction and may even affect efficiency.

If you look at the current cockpit - driving integration systems, they are still separate. You can't run one task and then another right away. If you can't do that, putting two chips into one chip may not change the number of transistors, but only save the cost of one packaging. For mid - and low - end chips, this part of the cost can be saved, but not much.

In my opinion, as intelligent driving becomes more and more high - end in the future, cockpit - driving integration may not be very meaningful. If these chips are placed closer together and made into a small - volume integrated solution on a board, that's okay. There's no need to make them into one chip; multiple chips can be placed together.

Question: What conditions are needed behind self - developed chips, such as sales volume, revenue, and R & D investment? What conditions are needed for chips to be continuously iterated considering the current rapid iteration speed of autonomous driving?

Xie Yan: The initial investment in chips is indeed quite large, perhaps several hundred million yuan a year.

The first condition is to reach a certain revenue scale. For car companies, if the annual revenue scale is over 100 billion yuan and the R & D investment is at least 10%, there will be several billion to tens of billions of yuan. It's feasible to invest in chip R & D every year. The second condition is that the problems solved by your chip R & D should be able to make your product more capable.

Many people say that chips need a large shipment volume. In fact, the cost of chips is related to the area. For example, the intelligent driving chips in a car, such as two Maher M100 chips in a Li Auto vehicle, add up to 800 square millimeters. While a high - end mobile phone chip is about 100 square millimeters. So the intelligent driving chips in a car are equivalent to the area of 8 mobile phone chips.

Calculated in this way, the wafer area required for hundreds of thousands of cars is very large, which can completely spread the cost. So the cost cannot be measured only by the number of chips.

Question: What exactly is the difficulty of the dynamic data - flow compiler, and how long did it take to overcome it?

Xie Yan: Before tape - out, even during the design stage, the compiler work has already started. Before tape - out, many models have already been run successfully.

The data - flow architecture is completely different. The problems it needs to solve are very similar to those that supercomputers or large - scale computer clusters need to solve. When the scale expands to hundreds of thousands of computers and millions of cores, and they communicate and cooperate with each other, you can't have a central administrator to manage hundreds of thousands of cores. The scheduling method of the traditional von Neumann architecture is not feasible at this scale. This is a very large - scale parallel scheduling problem.

This article is originally produced by「肖漫」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Catching up with FSD V14, what lessons is Li Auto making up for? | Frontline