HomeArticle

Doubao on Wheels 2.0: Connect the entire vehicle with a single AI brain

晓曦2026-04-24 22:09
In this new battlefield of automotive AI cockpits, the industry is already on high alert.

Last year, ByteDance shook the mobile phone industry with its deeply customized "Doubao Phone". Now, the product paradigm of the Doubao Phone Assistant is starting to spread to the automotive field.

On the opening day of the Beijing Auto Show, Volcano Engine, a subsidiary of ByteDance, released a new generation of automotive AI solutions based on the Agentic AI architecture, including two major solutions: the AI cockpit suite solution and the Doubao Cockpit Assistant solution.

The former's full - stack AI cockpit suite is a full - stack capability solution that automakers can flexibly choose from. The latter, the Doubao Cockpit Assistant solution, is the killer product for cockpit interaction that ByteDance is fully committed to building. It is a more complete product - level delivery, interconnected with the Doubao APP, and its capabilities evolve synchronously. It will be mass - produced and installed in vehicles within this year.

In terms of voice interaction experience, the Doubao Cockpit Assistant will enable the in - car voice assistant to break free from the previous question - and - answer command - based interaction, transform into understanding fuzzy natural language, break it down into corresponding commands, and execute vehicle control operations.

It is reported that the Doubao Cockpit Assistant will also achieve full - duplex dialogue capabilities, that is, the driver and the vehicle can communicate in real - time at the same frequency, talk whenever they want, interrupt at any time, and there is no need for wake - up.

This is not all. In terms of functions, it is not just about making the voice assistant simply adjust the temperature, play music and other vehicle control functions, but also covers the entire travel service such as driving, route planning, and entertainment interaction. For example, simply say to the Doubao Cockpit Assistant, "Park in the parking space closest to the entrance", and the system will mobilize the assisted driving system to automatically park in the corresponding parking space.

In addition, the Doubao Cockpit Assistant can also act as a "sightseeing tour guide", automatically recognize the beautiful scenery along the way, actively recommend scenic routes, and automatically slow down and open the windows. For example, during the journey, it will prompt, "Turn right and you can pass by the cherry blossom avenue, which will take 4 more minutes." After the passengers agree, it will automatically slow down and open the windows.

Multiple industry insiders revealed to 36Kr that Volcano Engine has internally promoted the project implementation by forming a cross - departmental special team. "It is expected that the complete capabilities of the Doubao Cockpit Assistant will be implemented in the second half of this year."

It should be clear that ByteDance's move is not to engage in car manufacturing, but to deeply co - create with automakers to implement a generational leap in cockpit interaction experience and achieve true end - to - end AI across the entire link.

The trend of large AI model capabilities entering the automotive cockpit first came from Tesla. Since Elon Musk installed the Grok large model developed by xAI in the vehicle, it has first achieved a generational improvement in natural language interaction experience. At the same time, it has also significantly improved in language intention understanding, personalization, and memory capabilities. It can also automatically plan routes and select POIs (Points of Interest) based on natural language instructions.

According to 36Kr, during this year's CES, the chairman of a leading automaker was deeply touched after seeing a video of his employees experiencing Tesla's Grok on - site, and immediately decided to deploy a large model in the vehicle this year. The internal team was quickly assembled and produced the first version in just two months.

The "Grok + FSD combination", or the entry of large models into the cockpit, is strongly stimulating the nerves of automakers. After years of fierce competition in intelligent driving, the experience has become nearly mature and even homogeneous, while cockpit interaction has been relatively quiet. The arrival of large models in the cockpit has re - awakened the imagination space connecting interaction, vehicle control, and services.

"Almost all automakers are deploying cockpit intelligent agents," an industry insider revealed. On the opening day of this year's Beijing Auto Show, many automakers such as Geely and Li Auto unveiled their corresponding products.

The technological acceleration point for automotive cockpit intelligent agents appeared at the end of 2024. With the launch of S2S (Speech - to - Speech, end - to - end voice) promoted by ChatGPT, the voice interaction delay has been greatly reduced, providing a foundation for real - time natural dialogue in the cockpit and favorable conditions for the automotive industry to explore super intelligent agents for cockpit interaction.

36Kr exclusively learned that Volcano Engine is one of the companies that has made significant investments. It has deeply cooperated with a leading star automaker. According to 36Kr, this automaker has also invested hundreds of millions of development fees in Volcano Engine, and ByteDance has also transferred personnel from Volcano Engine and Doubao respectively to form a project team for this project.

In addition to Doubao, Alibaba's Tongyi Qianwen is also rapidly entering the cockpit Agent market. It has deeply cooperated with Qualcomm 8797 to promote the deployment of edge - side models in vehicles. Obviously, in this new battlefield of automotive AI cockpits, tech giants and automakers are already on high alert.

Volcano Engine Forms a Special Team to Focus on End - to - End AI Assistant

The integration of large models into vehicles is not new, but in just one year, the product form has completely changed.

When DeepSeek became popular last year, there was a wave of integrating large models into vehicles. At that time, most automakers accessed large models through cloud engines. For example, most of the integration of the Doubao large model into vehicles was through Volcano Engine, which opened APIs for automakers to access and complete adaptation. Many automakers including BYD, Mercedes - Benz, and SAIC did it this way.

However, the results were not ideal. "After we accessed it and tried, it couldn't even pronounce the basic wake - up words correctly," a R & D personnel from an automaker told 36Kr Auto. This also led to the fact that after the model was accessed, it could only improve the Q&A ability and could not achieve Agent capabilities.

The problem is not entirely with the technology but also with the cooperation mechanism. Volcano Engine and automakers have adopted a new cooperation model.

36Kr learned that Volcano Engine and its automaker partner have established a special team to promote the Doubao Cockpit Assistant project. The automaker provides the vehicle platform and implementation capabilities and invests hundreds of millions of funds, giving the development leadership of the new cockpit interaction to Volcano Engine.

In addition, in this project, Volcano Engine has also deployed a model of about 30B scale on the vehicle side, which undertakes the full - domain perception ability: continuous input of visual, voice, and environmental information to achieve "always online". In the cloud, 3 to 4 core Agents run, responsible for tasks such as cabin - driving coordination, driving experience, comfort control, and emotional interaction.

On this basis, the system can complete full - duplex voice dialogue - the communication between the user and the system is no longer one - by - one, but can be interrupted, inserted, and continued at any time, approaching natural dialogue between people.

"The advantage of deploying a large model on the edge side is that it can access all local files and has operation permissions for local apps," a cockpit R & D personnel said.

It is worth noting that previously, the "intelligent computing power" in the automotive industry was mainly concentrated in the field of intelligent driving. Deploying a 30B - level large model on the vehicle side has almost no precedent. Even for intelligent driving models, the known upper limit in the industry is currently around 4B.

A source revealed to 36Kr that in order to run this ultra - large model on the vehicle side, Volcano Engine customized the Thor z chip from NVIDIA. "It is customized and optimized for memory and bandwidth" and is deployed on the vehicle in the form of an "external" computing power box called AI Box. At the auto show, Volcano Engine also disclosed this hardware solution.

Public information shows that NVIDIA released the cabin - driving integrated central computing chip Thor in Q4 2024, which includes product lines such as Super/X/S/U/Z. Thor Z is the entry - level version with a single - chip computing power of 360Tops.

"This is more like running a real - time video stream system on the vehicle, an attempt regardless of cost, but it is difficult to implement at the commercial level," an industry insider commented. According to his calculation, if users frequently use voice interaction, Agent scheduling, and visual perception capabilities, the monthly model and computing power cost per vehicle may exceed 10 yuan, far higher than the current cost of in - vehicle services.

In addition, both parties also need to overcome more problems at the engineering level. "The two parties are cooperating on a new vehicle model platform, and the automaker itself does not have OS capabilities, which means re - creating an in - vehicle platform. The apps almost need to be re - adapted, and even the map needs to be deeply customized. This is a very complex and long - cycle task," an industry insider said.

Volcano Engine has also made significant investments in this project. "A team of hundreds of people has been involved, and at the same time, it is working hard with leading Internet application service providers such as Meituan and Gaode Map. Each app takes several months to re - package," an informed source said.

One Vehicle, One AI Brain

Interaction at the vehicle level has always been complex, including functional interactions related to high - safety, entertainment interactions not related to safety, and cross - interactions related to intelligent driving.

This segmentation of usage scenarios and cross - departmental resistance also make it difficult for automakers to break through limitations and explore the form of a whole - vehicle - level AI brain. Volcano Engine believes that in a closed cockpit scenario, interaction through a unified AI brain is a more extreme solution.

According to the relevant person - in - charge, in the new generation of automotive AI solutions, Volcano Engine integrates three engines: the dialogue reasoning engine, the goal - driven engine, and the learning and growth engine into a unified "automotive brain".

Moreover, the AI architecture paradigm is also changing. Openclaw became popular quickly after the Spring Festival. Automakers generally believe that AI should no longer enter the vehicle in the form of a simple chatbot but needs to be a "comprehensive key" in a new task paradigm.

Volcano Engine has put forward a more specific solution idea. Different from the "turn - based" interaction mode of Chatbot, under the new AI architecture paradigm, Agentic AI has autonomous driving capabilities, can perceive the environment in real - time, receive feedback, and continuously learn and iterate independently. Relying on a powerful large - model base, it can link global knowledge and diverse tools to autonomously advance tasks with clear goals; it can also conduct self - review of the execution results to achieve continuous evolution.

Breaking it down, the "dialogue reasoning engine" can achieve natural communication like a real person, saying goodbye to the cold and mechanical "turn - based" Q&A. Through the large - model rejection on the edge side, VLM recognition, and the same ASR ability as the Doubao input method, it can achieve always - on wake - up - word - free and multi - person dialogue capabilities. Each communication will naturally join the conversation when it should. Based on the industry - leading dual - stream full - duplex capability, it can achieve real - time same - frequency dialogue between people and the vehicle, interrupt and interject at any time, and communicate as naturally as real people.

The "goal - driven engine" can autonomously call global in - vehicle tools based on task goals and environmental feedback, and truly do things like a person. It can handle complex, multi - step, and cross - scenario things from start to finish without you having to repeatedly explain. For example, according to the state of the child in the back row and the vehicle - side memory, it can choose appropriate ways such as singing, playing cartoons, telling stories, playing games, and lulling to sleep to accompany the child throughout the journey.

The "learning and growth engine" can continuously summarize experience in the process like a person and improve itself. It is not limited to basic memory capabilities such as preferences, topics, and scenarios, but can also precipitate experience during task execution to form reusable Skills.

Through the in - depth integration of AI and the vehicle, Volcano Engine will jointly create a vivid, intelligent, and universal intelligent cockpit user experience with automakers, making the vehicle more like a person, with free and emotional communication, high - IQ and growing problem - solving ability, and simple operations like human instincts.

Of course, the Volcano Engine team also has a rational understanding of the engineering complexity of automaker interaction. The relevant person - in - charge of the team said in an interview that the project priority of the Doubao AI Cockpit Assistant is to do a good job in basic capabilities such as vehicle control.

"In the process of doing vehicle control, we have also iterated several versions. Gradually, we found that we need to access more vehicle control capabilities. From the initial access of more than 100, to hundreds, and then to thousands, we will find how to converge, how to avoid hallucinations, and how to achieve the result expected by users."

And this requires strengthening the edge - side capabilities. In the past, the Doubao model was almost all deployed in the cloud. According to reports, currently, the Doubao large model has also been deployed on the edge side of the vehicle.

In addition to strengthening the edge - side capabilities, it also depends on the self - learning and evolution capabilities of the model. According to the business person - in - charge, a complex scenario may be composed of multiple tool calls. "In fact, your self - learning essentially means that during the user's use process, I use the model to extract the KnowHow (knowledge points) of this scenario by itself. Then store that KnowHow back to guide my model's process of tool call in this scenario, or the change in the sequence and timing. In fact, that is the real self - learning. So I think that most of the so - called self - learning in the market now may only have a little iteration in a certain field, but not a real sense of complete evolution."

Over the past few years, Volcano Engine's accumulation in the automotive industry has also helped the team quickly establish an understanding of vehicle - level interaction. According to the data disclosed by Volcano Engine, currently, more than 7 million intelligent vehicles are equipped with the Doubao large model, and the installation volume ranks among the top in the industry. During this auto show, many heavy - weight new cars equipped with the Doubao large model, such as the Mercedes - Benz all - electric GLC, SAIC Audi E7X, SAIC Volkswagen ID. ERA 9X, Chery Exeed EX7, FAW Hongqi HS6 PHEV, Buick E7, and Roewe's new series "Jiayue", were unveiled.

As the end - to - end AI capabilities are gradually implemented in the cockpit, the automotive industry will obviously set off a new wave of AI technology.

Automakers Rush to the Cockpit AI Market, and Tech Giants Compete to Enter

Volcano Engine and automakers are still building their "showroom". On the other hand, Alibaba's Tongyi Qianwen is also deeply binding with the Qualcomm 8797 platform to promote the large - scale deployment of edge - side large models in the new generation of cockpits. Qualcomm 8797/8397 is the fifth - generation cabin - driving integrated automotive - grade chip launched in 2024, targeting NVIDIA's Thor series, with a single - chip maximum computing power of up to 640TOPS.

36Kr learned that the scale of the edge - side model promoted by Qianwen is about 4B. Automakers such as BYD, GAC, Li Auto, and XPeng are all in contact. This means that on the cockpit battlefield, Doubao and Qianwen are facing off again. Qianwen is mainly bound to Qualcomm 8797, while Doubao promotes the NVIDIA AI BOX form, and NVIDIA has thus entered the cockpit market from the intelligent driving field.

The "soul question" is once again in