HomeArticle

Ideal Re-doubles Its Bet on VLA: Xia Zhongpu, the Person in Charge of the "End-to-End" Model, to Leave | Exclusive from 36Kr

李安琪2025-05-21 09:00
Li Auto is more determined to go all in on the VLA large model.

Text | Li Anqi

Editor | Li Qin, Yang Xuan

36Kr Auto has exclusively learned that Xia Zhongpu, the person in charge of the "end-to-end" model for Li Auto's assisted driving, will leave the company in the near future. Xia Zhongpu holds a position at the 21st level and reports directly to Lang Xianpeng, the vice president of Li Auto's assisted driving R & D.

People familiar with the matter said that Xia Zhongpu has withdrawn from the VLA project team for Li Auto's latest assisted driving solution and has not attended business regular meetings for several weeks. However, his next move after leaving the company is not yet clear.

36Kr Auto verified the above information with Li Auto's official. As of press time, no official response has been obtained.

Xia Zhongpu joined Li Auto in 2023 and was mainly responsible for the planning and control model of the assisted driving system. Previously, Xia Zhongpu worked in Baidu's Apollo department.

The technical module that Xia Zhongpu was in charge of was the key to the implementation of Li Auto's "end-to-end" assisted driving solution at that time. Due to the good results of the solution, when Li Auto adjusted its assisted driving team into three major departments in November 2024, namely the "end-to-end" model, the world model, and mass production R & D, Xia Zhongpu officially became the person in charge of the "end-to-end" model and reported directly to Lang Xianpeng.

During his two - year tenure at Li Auto, Xia Zhongpu was promoted from P9 (corresponding to the 19th level in Li Auto's new position - level system) to the 21st level. This promotion speed is rare within Li Auto.

However, people familiar with the matter told 36Kr Auto that Xia Zhongpu's departure may be related to the change in Li Auto's assisted driving technology route.

"Xia Zhongpu believes that there is still room for optimization in the end - to - end route, but Li Auto has placed its bet on the VLA (Vision - Language - Action) model route," said a person familiar with the matter.

On May 7th, Li Xiang, the CEO of Li Auto, said in his AI Talk that "VLA is a large - scale driver model that works like a human driver." Li Auto has also invested three times more training cards than expected for this.

The management of the assisted driving team also has more resources. According to 36Kr Auto, Lang Xianpeng, the person in charge of Li Auto's assisted driving, has been promoted to the 24th level. The VLA technology route is led by Jia Peng, the person in charge of assisted driving technology R & D. Previously, Jia Peng was also responsible for the pre - research of technologies such as Li Auto's world model.

Since 2023, Li Auto's assisted driving technology route has undergone several switches: from a solution that relies on high - precision maps and rules to the "end - to - end" approach, and now to the VLA model route.

The implementation of the "end - to - end" solution was a crucial battle for Li Auto to gain fame. The "end - to - end" solution was first implemented by Tesla. Compared with the previous rule - based solutions written by engineers, the "end - to - end" approach relies more on the autonomous learning ability of AI models, and the information in the "perception - prediction - planning - control" chain of the assisted driving system can also be transmitted without loss.

Li Auto's "end - to - end" solution was initiated in November 2023. Due to its outstanding implementation results, Li Auto fully pushed the "end - to - end + VLM (Visual Language Model)" solution to Max version users in October 2024. "It was two months ahead of the original implementation time," said a person familiar with the matter.

As a result, Li Auto got rid of the label of being a "laggard" in assisted driving and quickly squeezed into the first echelon of the industry. Xia Zhongpu, as the person in charge of the "end - to - end" mass production, also got an internal promotion opportunity.

However, Li Auto does not think that the "end - to - end" approach is the ultimate answer.

At the AI Talk on May 7th this year, Li Xiang, the CEO of Li Auto, explained the company's thinking on the replacement of the internal technology route. He said that the "end - to - end" approach does not fully understand the physical world and is more like an imitative behavior. "The end - to - end approach can handle most generalizations without problems, but it will encounter problems when facing particularly complex scenarios that it has never learned," Li Xiang said.

Although Li Auto added the VLM (Visual - Language) mode to the "end - to - end" solution, the company still believes that the role of VLM is limited.

Li Auto is more optimistic about the VLA (Vision - Language - Action) technology route. The VLA model was first launched by Google's AI company DeepMind and was mainly used in the field of robotics. Subsequently, it gradually became the mainstream technical paradigm and framework in the field of embodied intelligence.

Different from visual language models (VLM) such as ChatGPT and Sora, VLA adds the "action" ability to interact with the physical world. In other words, VLA not only understands the surrounding environment but also can directly output control instructions, such as robot actions or vehicle driving decisions. VLA has also been applied to the field of assisted driving.

Li Auto believes that VLA can fully perceive the physical world through the combination of 3D and 2D vision, unlike VLM which can only analyze 2D images. At the same time, VLA has a complete brain system with language and CoT (Chain of Thought) reasoning abilities. It can see, understand, and truly execute actions, which conforms to the way humans operate.

Increasing the world - general knowledge ability and reasoning ability of the assisted driving system is also becoming a major trend in the industry. Recently, NIO's world model solution also emphasizes the recognition ability of traffic signs and text; XPeng's previously released cloud - based large - scale model also has complex chain - reasoning abilities. After being distilled to the vehicle end, it can achieve the effect of large - scale model - controlled vehicles.

However, some industry insiders told 36Kr Auto that the VLA route is still in its early stage and has not been tested through a large number of practical implementations. As Li Xiang himself said, "We are actually walking in an uncharted area."

With the departure of the technical leader of the old "end - to - end" route, Li Auto's determination to go all - in on the VLA large - scale model has become even more resolute.