No graphics cards tonight. Jensen Huang ignites the Rubin era, with six cores soaring to five times the computing power.
At the recent CES 2026, Huang (Jensen Huang) introduced the Vera Rubin supercomputing architecture to the world! Rubin's inference performance is 5 times higher than that of Blackwell, its training performance is 3.5 times higher, and the cost is reduced by 10 times. It has been put into large - scale production and will be available in the second half of 2026. Without new graphics cards last night, Huang said he's all in on AI!
With a loud bang in the sky, the new version of "Huang in a leather jacket" made a dazzling appearance.
The most exciting moment in this CES speech was the official debut of NVIDIA's new - generation chip architecture, Vera Rubin!
Is the global AI computing power in short supply? Huang responded boldly: Vera Rubin has been fully put into production.
This is a new - generation computing power monster and a dimensionality - reduction strike against the previous - generation hegemon, Blackwell —
The cost of inference tokens has dropped by 10 times directly, and the computing power performance has skyrocketed by 5 times.
Even the number of GPUs required to train the MoE model has been directly reduced by 4 times.
Once upon a time, Blackwell ended Hopper; now, Rubin has buried Blackwell with its own hands.
During the nearly two - hour speech, the key points Huang mentioned included —
The next - generation Rubin platform debut: Six chips, with inference performance skyrocketing ten times
End - to - end model for autonomous driving: AlphaMayo can think and reason autonomously, and drive on the road without any takeover throughout the journey
Open - source of the full suite of physical AI: Foundation models and frameworks
Players can't sleep at night: There are no graphics cards at CES 2026
As for game players?
Sorry, there are really no new graphics cards this time.
An announcement from NVIDIA on X completely shattered the last fantasy of "PC - building enthusiasts": No new GPUs will be released at CES 2026.
This means that NVIDIA's tradition of releasing new hardware at CES for five consecutive years since 2021 has come to an end.
The long - rumored RTX 50 Super series, trapped in the "production hell" of GDDR7 video memory, has most likely died in the womb.
Rubin makes a stunning debut. With 6 chips and 10 - fold inference, the AI supercomputer becomes a factory
Last October, Huang predicted that in the next five years, 3 to 4 trillion US dollars will be invested in AI infrastructure.
The large - scale production of Vera Rubin comes at just the right time.
If Blackwell broke the limit of single - card performance, then Rubin solves the problem of system scale - up.
From now on, computing power will be as cheap as electricity, and the great explosion of AI is just around the corner!
In 2024, the Vera Rubin architecture made its first appearance.
After waiting for two years, it is finally officially put into production!
The Blackwell architecture will now exit the historical stage.
At the speech, Huang told everyone: The computing power required for AI is soaring sharply. What should we do? Don't be afraid. Vera Rubin will solve the fundamental challenges we face!
This platform, born for the massive inference of trillion - parameter models, will completely enable low - cost, large - scale, and industrial production of computing power.
The Rubin architecture is named after the astronomer Vera Florence Cooper Rubin.
It can be said that Rubin is NVIDIA's first attempt to design CPU, GPU, network, storage, and security as a whole.
The core idea is: Instead of "stacking cards", turn the entire data center into an AI supercomputer.
The entire Rubin platform consists of these 6 key components.
Among them, the Rubin GPU is the core of the entire platform. It is equipped with the third - generation Transformer engine, providing 50 PFLOPS of NVFP4 computing power for AI inference.
The reason it can achieve 5 times the performance of the Blackwell GPU is because of its NVFP4 tensor core, which can analyze the computing characteristics of each layer of the Transformer and dynamically adjust the data precision and computing path.
In addition, the architecture also introduces a brand - new Vera CPU, specifically designed for agent inference.
It uses 88 self - developed Olympus cores by NVIDIA, is fully compatible with Armv9.2, and has an ultra - fast NVLink - C2C connection, enabling full - performance execution of 176 threads, and directly doubling the I/O bandwidth and energy efficiency ratio.
When we enable a new workflow in Agentic AI or long - term tasks, it will put great pressure on the KV cache.
To solve the bottlenecks of storage and interconnection, the Rubin architecture has specially improved the Bluefield and NVLink systems. It is connected to computing devices externally, so that the scale of the overall storage pool can be expanded more efficiently.
The BlueField - 4 DPU is a data processing unit that can offload network, storage, and security tasks and manage the context memory system of AI.
In NVLink 6, a single chip can provide a switching capacity of 400 Gb per second. Each GPU provides a bandwidth of 3.6 TB/s, and the Rubin NVL72 rack provides 260 TB/s, with a bandwidth exceeding that of the entire Internet.
With a bandwidth of 3.6 TB/s and in - network computing power, it can make the 72 GPUs in Rubin work together like a super GPU, directly reducing the inference cost to 1/7.
On - site, Huang showed us the tray of Vera Rubin. On this small tray, 2 Vera CPUs, 4 Rubin GPUs, 1 BlueField - 4 DPU, and 8 ConnectX - 9 network cards are integrated, and the computing power of the entire computing unit reaches 100 PetaFLOPS.
The goal of Rubin is to solve the training cost of MoE and trillion - parameter models. Has it achieved this? Obviously, the results are remarkable.
Training and inference efficiency soar
Test results show that the running speed of the Rubin architecture when training models is directly 3.5 times that of the previous - generation Blackwell architecture (35 petaflops), and the speed of inference tasks is as high as 5 times, up to 50 petaflops!
At the same time, its HBM4 memory bandwidth has increased to 22 TB/s, reaching 2.8 times, and the NVLink inter - connection bandwidth of a single GPU has doubled to 3.6 TB/s.
In ultra - large - scale MoE training, the number of GPUs required by Rubin can be reduced to 1/4 compared to Blackwell, and the overall energy consumption is significantly reduced.
Behind this, there are three heroes.
NVLink 6 further greatly improves the inter - connection bandwidth between GPUs, so that multi - card training is no longer slowed down by communication; the coordinated scheduling of the Vera CPU and the Rubin GPU can reduce the idle time of "GPU waiting for data"; and the in - depth cooperation between ConnectX - 9 and Spectrum - 6 also ensures that large - model training is no longer limited by the cluster scale.
From now on, training trillion - parameter models is no longer a matter of "throwing money", but only an engineering problem.
Training is solved. What about inference?
The results show that on the inference side, the inference efficiency of the Rubin platform per token can be improved by up to 10 times! For the same model and response latency, the computing power cost can be directly reduced to 1/10 of the original.
Therefore, the model can handle long contexts of millions of tokens, and enterprise - level AI applications can also be deployed.