HomeArticle

Jensen Huang grabs the lobster: NVIDIA's new "nuclear bomb" offers a 10-fold increase in computing power, and OpenClaw is free.

机器之心2026-03-17 08:15
Lao Huang: The computing power anxiety of technology companies is as huge as $1 trillion.

At 2 a.m. Beijing time this morning, the NVIDIA GTC conference officially kicked off in San Jose, California. This year's Keynote is destined to be frequently cited by CEOs of various companies.

“We have redefined computing, just like the personal computer revolution and the Internet revolution. We are now at the beginning of a brand - new platform transformation,” said Jensen Huang, co - founder and CEO of NVIDIA.

Last October, Jensen Huang also stated that he expected global companies to spend $500 billion on the Blackwell and Rubin systems within the next five fiscal quarters ending in late 2026. Now he says this market will reach $1 trillion between 2025 and 2027, and 60% of the business will come from hyperscale cloud computing.

The number has doubled directly because AI has reached the “inference inflection point.” If AI was previously in the stage of “crazy training” in the laboratory, now it has fully entered the stage of “inference and generation.” The demand for computing power is not peaking; it has just started to explode.

“So, is this reasonable?” For most of the remaining time of the Keynote, Jensen Huang was discussing this question.

Mass production of the new - generation Vera Rubin, a chip the world has never seen before

This year's new product is no longer a single chip but a large and complex AI computing power system.

Jensen Huang said that the NVIDIA NVL72 based on the new Vera Rubin architecture is a “bold bet.” In AI inference tasks, achieving maximum efficiency faces the greatest challenges. With the help of partners, NVIDIA's efforts have paid off.

The per - watt token performance of the NVL72 architecture has increased by 50 times, and the speed increase far exceeds Moore's Law.

This is the “token king.”

With the improvement of computing power and the development of AI technology, data centers, which used to be places for storing files, have now become factories for generating tokens. Jensen Huang pointed out that inference is the workload, and tokens are the new commodities.

In AI inference, more complex inference and lower latency will be the challenges that computing power needs to address. Higher efficiency also means more profits for enterprises.

The Vera Rubin NVL72 is “the engine that injects powerful impetus into the era of agent AI.” Jensen Huang demonstrated the entire Vera Rubin system on stage. It is a large and complex system that includes seven new chips, aiming to build the world's largest - scale AI factory and optimized for all stages of AI, covering everything from pre - training, post - training, and testing expansion to agent inference.

NVIDIA presented the details of the Vera Rubin platform, including the Vera CPU, Rubin GPU, NVLink 6 switch, NVIDIA ConnectX 9 SuperNIC, BlueField4 DPU, Spectrum - 6 Ethernet switch, and the newly integrated Groq 3 LPU.

Specifically, the Vera Rubin NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs, which are connected via NVLink 6 and equipped with ConnectX - 9 SuperNIC and BlueField - 4 DPU. The Vera Rubin NVL72 achieves breakthrough efficiency — compared with the NVIDIA Blackwell platform, the former can train large - scale mixture - of - experts models using only a quarter of the number of GPUs. The inference throughput per watt has increased by 10 times, and the cost per token is only one - tenth.

The NVL72 is designed for hyperscale AI factories and can be seamlessly scaled with Quantum - X800 InfiniBand and Spectrum - X Ethernet, thereby maintaining high utilization in large - scale GPU clusters while reducing training time and total cost of ownership.

Is there a bigger “nuclear bomb”? Yes, there is, bro. It's the “NVIDIA Vera Rubin Ultra NVL576.” By introducing a brand - new double - layer fully interconnected NVLink topology, developers can vertically scale the system to a maximum of 576 GPUs.

The Vera Rubin Ultra NVL576 will connect eight independent MGX NVL racks, each equipped with 72 Rubin Ultra GPUs. All racks are interconnected through copper cables and direct - connect optical interconnections to form a unified 576 - GPU NVLink domain.

This system will be built on the same MGX rack - level ecosystem, enabling the fastest mass - production implementation cycle.

To verify this large - scale cross - rack NVLink topology architecture, NVIDIA internally built a fully functional prototype system based on GB200 called Polyphe, as shown in the following figure:

Of course, the latest Vera Rubin computing power will also be deployed in space.

Jensen Huang announced that NVIDIA is developing a new chip/computer called Nvidia Vera Rubin Space - 1 for orbital data centers. “There is no conduction, no convection in space, only radiation. We have to figure out how to cool these systems in space, but we have many excellent engineers working on this problem.”

NVIDIA said that compared with the previous - generation architecture, the implementation speed of Vera Rubin has significantly accelerated, and it has now started to be deployed on Microsoft Azure. With the launch of Vera Rubin, the turning point for AI agents has arrived, and the largest - scale AI infrastructure in history is about to be launched.

New AI inference chip LPU

The powerful capabilities of Vera Rubin are inseparable from the LPU (Language Processing Unit).

In December last year, NVIDIA reached a strategic deal with the AI inference chip company Groq for approximately $20 billion, obtaining the authorization for Groq's inference technology, acquiring some of its chip assets, and recruiting core team members, including founder Jonathan Ross and President Sunny Madra.

The value of Groq lies in its ability to break through the bottlenecks of pure GPU AI servers in low - latency inference, token decoding efficiency, and energy consumption through the synergistic computing of the LPU's specially optimized inference pipeline and the GPU.

At this GTC conference, the release of the NVIDIA Groq 3 LPX marked an important milestone in the field of accelerated computing.

Large - model inference has long faced a core contradiction: it is often difficult to achieve both low latency and high throughput. The Groq LPX architecture works in tandem with the Vera Rubin GPU and is specifically optimized for the low - latency and ultra - long context inference required by agent systems.

Under this architecture, the inference throughput per megawatt can be increased by up to 35 times, and it brings up to 10 times the revenue potential for trillion - parameter models.

Higher per - watt throughput and token - level performance will open up a new inference level, making it possible to perform inference on ultra - high - end models with trillions of parameters and millions of contexts, and bringing greater business space for all AI service providers.

In terms of design, the LPX rack features a full - liquid - cooling design, is built on the MGX infrastructure, and can be seamlessly integrated into the next - generation Vera Rubin AI factory.

Meanwhile, the LPX rack contains 256 LPU processors, providing 128GB of on - chip SRAM and 640 TB/s of vertical interconnection bandwidth.

During large - scale deployment, a large number of LPUs can work together, operating like a giant single processor to achieve high - speed, deterministic inference acceleration.

When deployed together with the Vera Rubin NVL72 system, the Rubin GPU and LPU will perform synergistic computing on each output token of each layer of the AI model, significantly improving decoding performance.

The LPX architecture is optimized for trillion - parameter models and million - token contexts. Through collaborative design with Vera Rubin, it achieves the best balance between power consumption, memory, and computing efficiency.

Currently, the LPU is manufactured by Samsung on a contract basis. In the future, the next - generation LPU may be manufactured by TSMC. Additionally, in the future GPU (Feynman architecture), the Groq processor may also be integrated, which is expected to improve performance while reducing costs.

The Nvidia Groq 3 LPX is expected to be officially launched in the second half of this year.

NemoClaw: NVIDIA's version of OpenClaw is launched

The hottest concept in the tech circle recently is OpenClaw. Jensen Huang compared it to an “operating system” at the GTC. Simply put, OpenClaw is an agent platform that can connect to cloud systems. It can generate other agents, perform scheduling, break down problems, and so on.

However, current AI agents based on OpenClaw have security risks when communicating with the outside world. The NemoClaw launched by NVIDIA has enterprise - level security protection, which helps protect sensitive information.

NVIDIA successfully positioned OpenClaw as an enterprise - level security solution by adding multiple layers of security protection on the infrastructure built by OpenClaw founder Peter Steinberger. Jensen Huang said that NVIDIA assembled “the world's top security researchers to modify OpenClaw so that it can be safely deployed within enterprises.”

He also emphasized that every enterprise now needs to formulate its own OpenClaw strategy. In Jensen Huang's view, OpenClaw and the broader Claw system will be as important as basic software facilities such as Linux, Kubernetes, and HTML in the future.

At the specific technical level, NemoClaw is a set of basic software tools that make OpenClaw easier to deploy and run more securely. Through the NVIDIA Agent Toolkit, users can complete the installation and optimization of OpenClaw with just one command and automatically deploy the OpenShell runtime.

This runtime provides support for open - source models and an isolated sandbox environment, allowing AI agents to be constrained by security, network, and privacy policies when performing tasks, calling tools, or accessing the external network.