HomeArticle

NVIDIA has made its core autonomous driving technologies public. WU Xinzhou led the R & D. The VLA large model and massive data can be used for free.

智东西2025-12-03 18:50
NVIDIA unveils its “maxed - out account” for autonomous driving to tackle long - tail scenarios.

Everyone can use it. NVIDIA open-sources the VLA autonomous driving model.

Recently, the research team at NVIDIA officially released and open-sourced a brand-new Vision-Language-Action (VLA) model called Alpamayo-R1 (abbreviated as AR1), and clearly announced plans to open-source some core datasets of this model in future updates.

▲The dataset corresponding to Alpamayo-R1 has been uploaded to the open-source community.

Currently, the dataset corresponding to this model has also been uploaded to the open-source community, with a total size of approximately 100TB. This is also NVIDIA's first time to open-source a VLA model.

In terms of data licensing, NVIDIA has clarified that the dataset can be used for both commercial and non-commercial purposes. This may mean that companies with little prior accumulation in VLA technology can also quickly start developing VLA through NVIDIA.

▲Alpamayo-R1 model architecture

This move not only breaks down the closed walls of high-end autonomous driving models but also marks a new stage in end-to-end autonomous driving technology, moving from simple "behavior imitation" to in-depth "causal thinking."

For the autonomous driving industry, the emergence of Alpamayo-R1 directly addresses the most pressing pain point at present - safety in long-tail scenarios.

The Alpamayo-R1 brought by NVIDIA this time aims to end this dilemma, and its actual test results are quite convincing.

▲Alpamayo-R1 shows significant improvement compared to the baseline.

In tests for extremely difficult long-tail scenarios, the planning accuracy of AR1 has increased by a full 12% compared to the baseline model with only trajectory prediction;

In closed-loop simulation tests, AR1 has successfully reduced the accident rate of vehicles running off the road by 35%;

The rate of dangerous close encounters with other vehicles or pedestrians has also been significantly reduced by 25%.

What's even more remarkable is that even after integrating a complex reasoning brain, the model still maintains an ultra-low end-to-end latency of 99 milliseconds on NVIDIA RTX 6000 Pro Blackwell in-vehicle hardware, fully meeting the strict requirements of real-time autonomous driving.

01.

Solving the end-to-end black-box problem in autonomous driving

Introducing the causal chain dataset

In the past few years, although end-to-end large models based on imitation learning have made significant progress by piling up large amounts of data, they are essentially more like "black boxes" that only memorize mechanically.

These models can accurately imitate the operations of human drivers but lack causal understanding of the scenarios. They know "to brake when there is a car in front" but don't know "why to brake."

This defect of knowing the result without understanding the reason makes vehicles often perform poorly and have self-contradictory decision-making logic when facing complex and high-risk road conditions they've never encountered before.

The ability of the VLA model to introduce "world knowledge" into the cockpit is currently one of the recognized solutions to break through the long-tail problem of L4-level autonomous driving.

▲Ideal Auto's VLA model architecture

However, the VLA not only has problems such as model hallucination and latency but also has extremely high requirements for computing power, algorithms, and datasets. Currently, only leading companies such as XPeng, Li Auto, Xiaomi, and DeepRoute.ai are promoting the implementation of VLA in vehicles.

In terms of open-source projects, apart from NVIDIA's AR1 this time, only academic projects such as OpenDriveVLA are being iterated.

Therefore, the open-sourcing of NVIDIA's VLA model and dataset is like a bombshell, bringing some new changes to the R & D and implementation of VLA.

Specifically looking at NVIDIA's project, in order to make AI truly learn to think like an experienced human driver, NVIDIA didn't choose to make minor improvements to the existing model but instead launched a revolution starting from the most fundamental data construction.

▲Demonstration of causal chain reasoning

To solve the problems of vague descriptions and lack of logical associations in traditional datasets, the research team built a brand-new "Chain of Causation (CoC)" dataset.

The core of this dataset is to teach the model to establish a strict logical closed-loop of "observation - cause - decision." It no longer allows AI to generate irrelevant comments like "the weather is clear and the road is wide."

Under this model, the prompt can clearly state "Because there is a vehicle forcefully changing lanes on the left and a pedestrian is crossing the road ahead, so I decide to slow down and avoid."

This way of data construction not only eliminates causal confusion but also effectively improves the model's logicality.

02.

Introducing a new architecture to balance model performance

With strong data support, Alpamayo-R1 adopts a modular and efficient architecture design, skillfully balancing "slow thinking" and "fast action."

Its "brain" is driven by the Cosmos-Reason vision-language model specially developed by NVIDIA for physical AI, which is responsible for handling complex environmental understanding and logical reasoning.

And the actions are controlled by an action expert decoder based on Flow Matching technology.

This division of labor mechanism allows the model to use the extensive knowledge of large language models for in-depth thinking and at the same time generate smooth and vehicle-dynamics-compliant driving trajectories through the diffusion model, perfectly solving the common problem of slow reaction in large models.

However, what really makes Alpamayo-R1 stand out is the reinforcement learning (RL) mechanism introduced during the training phase.

▲High consistency between reasoning and action will increase the reward.

After supervised learning teaches the model basic driving skills, the researchers introduced a more demanding "examiner" - using a larger-scale reasoning model as a critic to score AR1's performance.

The training goal at this stage is very clear - requiring the model to match its words with its actions.

In this regard, the reward function in this model not only values the safety of vehicle driving but also emphasizes whether the reasoning logic stated by the model matches the actual driving actions.

If the model reasons that "stop because of a red light" but actually accelerates, it will be severely punished.

▲The quality has been significantly improved after adopting the new reinforcement learning mode.

This training method makes AI's explanations no longer a perfunctory afterthought but truly becomes a decision-making guideline for guiding vehicle actions. The reasoning quality has been improved by 45%, and the consistency between reasoning and action has also increased by 37%.

There is also a small Easter egg at the end of the paper. In the acknowledgments, the first person mentioned is Wu Xinzhou, the person in charge of NVIDIA's autonomous driving.

▲Wu Xinzhou is the first in the acknowledgments.

Wu Xinzhou is quite well-known in the autonomous driving circle. Before joining NVIDIA, he served as the vice president of autonomous driving at XPeng Motors.

In August 2023, Wu Xinzhou officially joined NVIDIA as the Vice President of Automotive, reporting directly to CEO Jensen Huang. He is currently fully responsible for the R & D and implementation of NVIDIA's autonomous driving software algorithms.

03.

Conclusion: NVIDIA's first open-source VLA model

The release and open-sourcing of Alpamayo-R1 is not just about releasing a high-performance model itself. For the autonomous driving industry, this may be the beginning of a reshuffle.

For a long time, the R & D threshold for high-order end-to-end autonomous driving has been extremely high and has been in the hands of giants with massive data and computing power.

By open-sourcing AR1 and its dataset, NVIDIA is actually providing the entire industry with a set of "correct answers" for L4-level autonomous driving, effectively lowering the entry threshold for small and medium-sized manufacturers and research institutions, and may also give rise to a number of autonomous driving solutions fine-tuned based on AR1.

For NVIDIA itself, this move is also an embodiment of its "hardware-software integration" strategy. The powerful performance demonstrated by AR1 must rely on NVIDIA's powerful GPU computing power and the supporting Cosmos framework toolchain.

By defining the most advanced software paradigm, NVIDIA is subtly locking in the future hardware market.

This article is from the WeChat official account "CheDongXi," author: Janson, editor: Zhihao. Republished with permission from 36Kr.