HomeArticle

Li Auto CTO on OS Open Source: We've Identified Four Issues in Smart Cars | Exclusive Interview by 36Kr

李勤2025-04-16 17:25
How should intelligent vehicles evolve? The underlying system needs to figure it out.

As cars evolve more towards intelligent entities, they require a more robust underlying support system. New car - building companies at the forefront of exploration, such as NIO and Li Auto, have taken precautions. They have invested heavily in the development of vehicle - level operating systems and launched OS - level products for the entire vehicle.

On March 27th, Li Xiang, the CEO of Li Auto, officially announced at the Zhongguancun Forum the open - sourcing of its self - developed automotive operating system, "Li Auto Starry Ring OS". It is reported that Li Auto Starry Ring OS is an automotive operating system designed for the era of artificial intelligence, consisting of four major pillars: intelligent vehicle control, intelligent driving, communication middleware, and information security systems.

Recently, Xie Yan, the CTO of Li Auto, also participated in interviews with media including 36Kr, further elaborating on the development process, strategic mission, and technical highlights of Li Auto Starry Ring OS.

According to Xie Yan, the Li Auto Starry Ring OS project was launched in 2021, with a R & D team of 200 people. Over the years, the cumulative R & D investment has exceeded one billion yuan, and in 2024, the self - developed operating system was installed in vehicles for the first time.

Xie Yan is an experienced veteran in the field of domestic operating systems. Before joining Li Auto, he served as the vice - president of software engineering and the head of the OS department in Huawei's consumer business group, responsible for the R & D of operating system technologies such as HarmonyOS. Prior to that, from 2014 to 2019, he served as the chief architect and rotating general manager in Alibaba's AliOS business group, responsible for the technical architecture design and R & D management of mobile operating systems.

Xie Yan, CTO of Li Auto. Photo: Li Auto

In 2020, the pandemic caused a global supply - chain crunch, triggering a three - year chip shortage. When Xie Yan joined Li Auto in July 2022, the industry was in the middle of the chip shortage. Xie Yan recalled that at that time, the delivery cycle of a certain chip, which was originally one month, could be extended to six months, and the price could increase by 5 - 10 times. For example, a Bosch ESP chip that cost 13 yuan each soared to 4000 yuan in the second - hand market.

At this time, the first challenge facing domestic car manufacturers was to quickly adapt to chips from different manufacturers to ensure supply.

Xie Yan said that through the self - developed operating system, Li Auto reduced the adaptation time of new - model chips from six months to one month, supported various architectures of automotive chips, and achieved freedom in chip selection, greatly alleviating the impact of the chip shortage on the supply chain.

Since then, as smart cars have continued to evolve, the industry has witnessed trends such as cross - domain integration of the entire vehicle, an explosion of on - board computing power, and a clearer direction towards intelligent entities. Starry Ring OS has also been gradually upgraded accordingly.

Xie Yan introduced that in response to the development direction of smart cars, Starry Ring OS will build four core technical capabilities. Regarding the much - discussed cabin - driving integration direction in the industry, Xie Yan proposed the concept of computing power pooling. He said that Li Auto Starry Ring OS incorporates virtualization technology, which can abstract the underlying heterogeneous computing resources (such as CPU and NPU) into a unified "computing power pool".

However, Xie Yan emphasized that this is not exactly the same as the concept of cabin - driving integration. "What we mean is to open up the AI computing power to all domains after pooling. Cabin - driving integration also brings some problems because the iteration speed and safety requirements of the driving and cabin functions are different. So when combined, there will be a high - standard and a low - standard. Generally, the low - standard will be raised to the high - standard, which may not necessarily lead to cost reduction but may also increase costs and reduce flexibility."

In addition, in response to the information security issues that future smart cars will face, Li Auto Starry Ring OS has also designed native security mechanisms such as data encryption and protection, system integrity protection, identity authentication and access control.

On April 15th, Li Auto officially released the "Li Auto Starry Ring OS Technical White Paper" on its official website, detailing the technical architecture and core systems of Li Auto Starry Ring OS and officially implementing its open - source plan.

When talking about the fact that there are already open - source systems in the industry but they are not widely adopted, Xie Yan said that most existing open - source projects are redundant efforts, do not solve many problems, and do not create value. Secondly, the systems may be outdated and do not address future - oriented problems.

"While solving our own problems, we have also identified some future problems. Open - sourcing an operating system is not a one - time deal. Open - sourcing is just the starting point. We hope to jointly solve many problems with the industry and move forward faster and further together."

The following is the edited interview content between media such as 36Kr and Xie Yan, CTO of Li Auto:

Question: What are the development stages of Li Auto's Starry Ring OS?

Xie Yan: The Li Auto Starry Ring OS project was launched in 2021, with a R & D team of 200 people. Over the years, the cumulative R & D investment has exceeded one billion yuan, and in 2024, the self - developed operating system was installed in vehicles for the first time. There are three stages in the R & D of Li Auto Starry Ring OS:

[Stage 1, 2021 - 2022: Solve the "chip shortage" problem and achieve freedom in chip selection]

• Background: The supply - chain disruption began to emerge in the second half of 2020 → It worsened across the industry in 2021 → The structural shortage continued in 2022

• Problem: Li Auto's self - developed operating system started in 2020 at the beginning of the pandemic. The global pandemic hit the chip supply chain for the first time, triggering a three - year chip shortage in 2021:

○ Extended cycle: The original one - month delivery cycle could be extended to six months;

○ Out - of - control price: The chip price generally increased by 5 - 10 times. A typical example is that a Bosch ESP chip that cost 13 yuan each soared to 4000 yuan in the second - hand market;

○ Resource tilt: NXP had a production capacity of 500,000. Leading customers like BBA might directly take 60%, and the remaining Chinese automakers had to share the rest of the chips;

○ Chip switching: Generally, it took six months to adapt to a new chip and consumed a large amount of manpower.

• Solution: Through the self - developed operating system, the adaptation time of new - model chips was reduced from six months to one month, and it supported various architectures of automotive chips, achieving freedom in chip selection. This greatly alleviated the impact of the chip shortage on the supply chain.

[Stage 2, 2022 - 2024: Solve pain points such as system performance and security, and fully implement the self - developed system in vehicles]

• Background: Commercial vehicle - mounted systems from suppliers focus on standardization and are difficult to customize according to the differentiated needs of automakers. Open - source RTOS/Linux is more oriented towards general scenarios and cannot meet the real - time and safety requirements in the automotive field.

• Problem: As automakers gradually deepen their research and development of application software, they will have various customized requirements for the operating system. For example, cost pressure requires reducing system resource consumption; security pressure requires that the communication middleware can meet automotive safety requirements; and market competition pressure requires that the system can improve R & D speed and quickly solve various problems.

• Solution:

○ In the field of vehicle control systems, the core performance indicators, resource consumption, and R & D efficiency of the self - developed system can lead the excellent supplier solutions in the industry.

○ In the field of intelligent systems, the open - source Linux was modified to improve security, reduce resource consumption, and add hard - real - time scheduling capabilities.

○ In the field of communication middleware, it leads the excellent supplier solutions in the industry in terms of communication delay, resource consumption, stability, and security, which are the most concerned aspects of automakers.

[Stage 3, 2024 to present: Build a vehicle - level AI operating system, combine hardware and software, and carry out cross - domain joint innovation]

• Background: The explosive growth of computing demand, the surge in the number of sensors and data volume, and the increasingly fast rhythm of R & D iteration.

• Problem: If the hardware and software in each domain operate independently, the resource utilization, cost, real - time performance, security, scalability, and innovation speed of the entire system cannot reach the optimal level.

• Solution:

○ Horizontally, build a large platform of "vehicle - level AI operating system" to achieve a global perspective and collaborative optimization. Through innovations such as vehicle - level resource abstraction and management and end - to - end scheduling, achieve the "global optimal solution".

○ Vertically, achieve the optimization of the hardware - software joint architecture. Through "hardware concentration → resource pooling → service sharing", gradually move towards software - defined hardware.

Question: What core problems has Starry Ring OS solved?

Xie Yan: 1. The self - developed operating system realizes the decoupling between software and hardware and between different business software through technologies such as hardware - software decoupling, software - software decoupling, and customized tools, ensuring that the software iteration efficiency is not affected by the long - cycle iteration of hardware and can quickly adapt to different hardware.

Hardware - software decoupling: Support and be compatible with multiple types of hardware - software interfaces (MCAL, HAL). Through these interfaces, the changes in hardware are effectively isolated. When the hardware changes, only the appropriate interface standard needs to be selected and quickly adapted, greatly reducing the impact on upper - layer applications.

Software - software decoupling: Realize the decoupling and isolation of different business software on the MCU and support a service - oriented software architecture. This makes software functions modular and service - oriented, and each module can be independently iterated and upgraded.

Customized tools: Continuously improve the toolchain and simulation environment, simplify development dependencies, and improve efficiency.

2. With the operating system as the core, build the vehicle - level resource sharing and collaborative capabilities.

• Computing power pooling: The operating system incorporates advanced virtualization technology, abstracting the underlying heterogeneous computing resources (such as CPU and NPU) into a unified "computing power pool". Through the global resource view and intelligent scheduling algorithm, the OS can dynamically and precisely allocate and recycle computing power resources according to the real - time needs of each application task (such as priority, computing power demand, security level, and real - time requirement), realizing cross - domain and cross - application computing power sharing.

• In - vehicle communication Ethernetization: The operating system deeply integrates and manages the high - speed communication protocol stack based on in - vehicle Ethernet (especially supporting the TSN time - sensitive network). This enables low - latency and high - bandwidth data exchange and service calls between different domain controllers or computing units, providing the necessary data path and collaborative foundation for computing power sharing and service - oriented architecture (SOA).

• Smooth application migration: The operating system provides standardized API interfaces, rich middleware, and development toolchains, reducing the dependence of application software on specific hardware. This allows upper - layer applications and services to be more conveniently deployed, migrated, or even dynamically drifted between different computing nodes (whether central computing units or regional controllers), thereby maximizing the utilization of available hardware resources and simplifying software lifecycle management.

3. The self - developed operating system conducts full - stack design and optimization from the kernel, scheduling, communication to the toolchain, building an integrated solution that can handle complexity and provide end - to - end performance:

• Hard - real - time kernel: The operating system kernel needs to have extremely low and predictable interrupt latency and task - switching overhead. Adopt a preemptive, priority - driven hard - real - time scheduling strategy to minimize task blocking and uncertainty, providing a solid real - time foundation for the upper layer.

• End - to - end deterministic scheduling: The self - developed OS needs to have a global perspective, be able to understand and manage key task chains across multi - core heterogeneous processors (MCU/CPU/NPU) and even different physical nodes. Through global synchronization and collaborative scheduling technologies, uniformly arrange computing tasks and data streams to ensure that the entire link from sensor input to actuator output meets timing constraints.

• Deterministic communication: Deeply integrate and finely manage the in - vehicle Ethernet communication stack supporting the time - sensitive network (TSN), optimize the communication delay between processes/virtual machines/chips, provide guaranteed bandwidth, bounded delay, and extremely low jitter for the transmission of key data streams (such as sensor data and control instructions) on the shared network, and eliminate the uncertainty in the network communication link.

• Integrated tools: Through self - developed tools, help developers conduct end - to - end timing analysis and verification of complex real - time systems at the design stage, automatically generate optimal scheduling and communication configurations, ensure that the system can stably meet real - time and deterministic requirements in actual operation, and change from "post - debugging" to "design - guarantee".

4. Based on the industry's offensive and defensive situation and security best practices, build the native security capabilities of the operating system, create a multi - layer defense system, and implement it through hardware - software collaboration, constructing a systematic and multi - layer protection system covering data encryption and protection, system integrity protection, identity authentication and access control, and trusted execution environment:

• Data encryption and protection: Use strong encryption algorithms and secure key management mechanisms (combined with various types of hardware security modules HSM) to ensure the confidentiality and integrity of sensitive data (such as user privacy, key configurations, and communication content) during storage and transmission, preventing data leakage or tampering.

• System integrity protection: Build a secure boot mechanism, verify software signatures step - by - step starting from the hardware trust root to ensure that the loaded operating system and key applications are trusted and not tampered with; at the same time, continuously monitor key system files during runtime to prevent malicious modification.

• Identity authentication and access control: Conduct strict identity verification on entities attempting to access system resources to confirm their legitimacy; and implement a fine - grained access control mechanism to ensure that only legitimate entities can access and operate resources.

• Trusted execution environment: Use hardware isolation technology (such as ARM TrustZone) to create a highly secure operating environment (TEE OS) isolated from the ordinary operating system (Rich OS) within the main processor. It is used to execute the most sensitive operations (such as handling keys and running core authentication algorithms), protecting these core assets even if the main OS is compromised.

Question: What are your thoughts on the business - model level? Is Starry Ring OS charged?

Xie Yan: We don't expect to get direct returns from open - sourcing. We neither sell the automotive operating system nor interfere with how users use it. We just hope that everyone will join in, use it freely, and contribute freely, and move forward with Li Auto on the same path. In this way, we can move fast enough and ultimately achieve the vision of building an Intelligent system.

Question: What is the subsequent installation schedule of Li Auto Starry Ring OS in vehicles?

Xie Yan: All the models we release next will be equipped with Li Auto Starry Ring OS. Some components have already been used in previous models. Since Li Auto Starry Ring OS involves not just a single piece of hardware but a whole system with many controllers, some technologies can be used independently.

Question: You mentioned the keyword "computing power pooling". We have always heard the term "cabin - driving integration" (combining the chips for the cockpit and intelligent driving into one SOC). Can these two concepts be equated? What are the connections and differences?

Xie Yan: These two concepts are related but also different. Regarding computing power pooling, especially in the field of AI computing power, we believe that pooling is necessary. The current problem in the industry is that, for example, on Qualcomm's cockpit chips, more than half of the area is used to provide AI computing power, and on NVIDIA and Horizon chips for intelligent driving, a large amount of AI computing power is also provided, which all incur costs. So why don't we pool the AI computing power together and reuse it more efficiently? Through computing power pooling technology, we can integrate them to avoid some waste.

Moreover, not only in the intelligent driving and cockpit domains but also in the chassis and vehicle control domains, there will be a demand for AI computing power. So we need to pool