When Jensen Huang and Wu Yongming Shouted Out the Same English Word

In the past two years, AI has had its own set of metrics from models to applications. In the Agent era, all metrics should have a suffix: "/ 1M tokens".

Text | Qiu Xiaofen

Editor | Su Jianxun

On March 16 local time in San Francisco, USA, the NVIDIA GTC 2026 conference officially kicked off. In the two - hour impassioned keynote speech by Jensen Huang, the founder and CEO of NVIDIA, a keyword that ran through was —— Token.

△ Jensen Huang, Image source: Conference screenshot

Coincidentally, one day before GTC 2026, Alibaba announced the establishment of the Token Business Group —— a core independent business group on par with Taotian E - commerce and Alibaba Cloud. In this new business group, Alibaba also clarified the link of "creating Tokens, delivering Tokens, and applying Tokens".

Token is the basic semantic unit when large models process text information. When you input a piece of text, the model will first use a tokenizer to cut the text into a series of Tokens. These Tokens are then converted into numerical vectors and sent to the GPU for large - scale matrix calculations.

In the past, different roles in the AI industry chain had their own focuses and evaluation criteria. The underlying large - model teams focused on the scale of model parameters, the upper - layer application teams focused on user growth and daily activity, and the cloud service teams focused on the utilization rate of computing power and actual revenue.

Wu Yongming, the CEO of Alibaba, pointed out internally that AI Agents are extremely dependent on Tokens, and the future will enter a period of "explosive demand".

The signal conveyed by Alibaba and NVIDIA's successive emphasis on tokens is that when the Agent era arrives, the two giants hope to unify the measurement standards in advance from the technical, product, and business dimensions.

Perhaps in the near future, all indicators in the AI field will have a suffix: "/ 1M tokens".

Trillion - dollar Narrative: From Data Centers to Token Factories

"The inference inflection point has arrived," Jensen Huang pointed out in his GTC speech. In the past two years, the inference computing volume has increased by about ten thousand times, and the usage has increased by about 100 times. Behind this million - fold growth, the tangible change is that the role of AI has gradually shifted from perception, to generation, to inference, and finally to the ability to work.

Under this change, Jensen Huang pointed out in his speech that the correlation between tokens and the revenue of AI companies is becoming clearer —— a company that can obtain more computing power can generate more tokens, increase its revenue, and in turn make AI more intelligent.

Under this transmission link, NVIDIA is the biggest beneficiary, and its goals are becoming increasingly aggressive.

At the GTC in 2025, Jensen Huang predicted that the expected purchase order volume of the Blackwell and Rubin platforms before 2026 was $500 billion. At this GTC, he set a new goal for next year: this figure will double to over $1 trillion. This ambition quickly received a response from the secondary market, once driving NVIDIA's stock price up by 4.3%.

To support this trillion - dollar growth goal, Jensen Huang proposed NVIDIA's new narrative —— from data centers to token factories.

△ NVIDIA's revenue composition, Image source: Conference screenshot

Jensen Huang believes that in the future, every AI company and cloud service provider should take the efficiency of token factories as the core business indicator.

Even in the future, tokens will become a new "commodity" and will be repriced based on throughput and interaction speed.

Jensen Huang divided tokens into four price ranges in his speech ——

Free tier: High - throughput, low - interaction - speed area, mainly monetized through advertising models;

Intermediate tier ($3 per million tokens), Advanced tier ($6 per million tokens): Balanced throughput and interaction speed, mainly the mainstream paid area;

High - speed tier ($45 per million tokens), Ultra - high - speed tier ($150 per million tokens): High - premium and high - interaction area, also the new market targeted by the Rubin architecture and future architectures.

Jensen Huang emphasized that NVIDIA's three major architectures will enable customers to achieve extremely high throughput in the free tier, and in the highest - value inference level, the throughput efficiency of NVIDIA's new architecture will increase by 35 times.

△ Jensen Huang's token economics, Image source: Conference screenshot

NVIDIA is No Longer Just a GPU Chip Company

However, to achieve the ambition of a trillion - dollar token factory, NVIDIA cannot be just a GPU company as it used to be.

NVIDIA is obviously changing its product route. Instead of simply piling up computing power, it pays more attention to the comprehensive capabilities of the entire platform, especially inference performance.

At this GTC, Jensen Huang presented a new answer: He designed a computing system called Vera Rubin specifically for agent inference, which consists of 7 new chips, 5 rack systems, and 1 supercomputer.

According to the introduction, Vera Rubin mainly has the following highlights:

① GPU part: 72 GPUs are interconnected at high speed through NVLink, which not only accelerates the calculation speed of pre - filling (Prefill) but also ensures the response speed when generating Tokens (KV Cache);

② Vera CPU: When agents call tools, it often involves a large number of operations with repetitive logic and conditional judgments, which GPUs are not good at. Therefore, NVIDIA designed a brand - new Vera CPU to act as a "dispatcher" to handle control tasks and liberate the GPU. The Vera CPU is also the world's only data - center CPU using LPDDR5. (Author's note: Low - power LPDDR5 memory is generally used in flagship mobile phones)

△ Vera CPU, Image source: Conference screenshot

③ BlueField 4 + CX 9 storage platform: AI factories need to process a large amount of data. For this reason, NVIDIA has rebuilt a storage network optimized specifically for AI data streams.

④ CPO Spectrum - X switch: It integrates the optical engine and the switch chip, and is the world's first optically packaged optical Ethernet switch, which changes the traditional plug - in modules.

⑤ It uses a full - liquid - cooling solution, and the installation time is reduced from two days to two hours.

According to Jensen Huang, Vera Rubin will start shipping in the second half of 2026. In actual use, the advantages of the Vera Rubin system are that the inference speed will be 5 times faster than the previous - generation Blackwell Ultra, the token cost will be reduced by 10 times, and only 1/4 of the GPUs are needed in the MoE model.

It is worth noting that NVIDIA recently acquired the Groq LPU platform and integrated it into NVIDIA's computing system.

However, integrating the large - scale Vera Rubin with the compact Groq LPU will inevitably lead to chip scheduling problems. For this reason, NVIDIA specially developed an operating system called Dynamo.

The Dynamo operating system can be understood as a conductor that assigns computing tasks with different characteristics to more suitable hardware for execution to maximize efficiency.

Jensen Huang suggested that if the workload is mainly high - throughput, it is recommended to use 100% Vera Rubin; if a large amount of workload involves high - value token generation requirements such as code generation, some Groq chips can be introduced, for example, 25%.

△ Groq 3 LPU, Image source: Conference screenshot

Currently, the Groq LP30 has entered mass production and is manufactured by Samsung, with an expected shipment in Q3. According to the introduction, this heterogeneous collaborative design enables the data center to achieve a 35 - fold performance leap under unit power consumption, and at the same time takes into account ultra - low latency and high - value inference services.

In addition to Vera Rubin, NVIDIA also previewed the next - generation GPU architecture, Vera Rubin Ultra (to be launched in 2028), and the Feynman architecture.

In general, 3D stacking, LPU integration, heterogeneous storage, CPO (co - packaged optics), and copper interconnection are all core technical points of NVIDIA's future platform.

△ NVIDIA's platform roadmap (Blackwell, Rubin, Feynamn), Image source: Conference screenshot

NVIDIA's Version of OpenClaw is Here

In this speech, in addition to expounding his ambitions, Jensen Huang also devoted some space to the currently popular OpenClaw. As the most popular open - source project in human history, OpenClaw surpassed Linux's achievements over the past three decades in just a few weeks.

He believes that OpenClaw has three major functions: it can manage resources (tools, large language models), decompose problems and call agents, and output and execute in multiple modalities. Therefore, OpenClaw is essentially an agent operating system, as important as HTML and Linux.

In Jensen Huang's view, in the future, OpenClaw will reshape enterprise IT. In the future, every SaaS company will become an AaaS company. In addition to providing tools, it will also provide AI agents for specific fields. "An industry originally worth $2 trillion is about to grow to trillions of dollars."

However, Jensen Huang also issued a warning. When agents can freely access a company's sensitive data and code, OpenClaw will pose security risks. Therefore, NVIDIA also cooperated with Peter Steinberger, the developer of OpenClaw, to launch the enterprise version, NeMo Claw.

According to the introduction, NeMo Claw not only integrates NVIDIA's complete agent toolkit but also provides a series of measures (network guards, privacy routing, etc.) to ensure the data security of enterprises.

△ NVIDIA's version of OpenClaw, Image source: Conference screenshot

Jensen Huang even said that in the future job - hunting scenarios in Silicon Valley, tokens may become part of engineers' total annual salaries. "Joining with a token quota" will become a new recruitment topic in Silicon Valley.

The "GPT Moment" in Graphics

At the beginning of the press conference, Jensen Huang first introduced the neural rendering (Neural Rendering) technology, DLSS 5, which he called the next - generation graphics computing technology and the "GPT moment in graphics".

Specifically, DLSS 5 by NVIDIA consists of two parts: it integrates generative AI and probabilistic computing on the basis of 3D graphics and structured data.

The two parts each make contributions. 3D graphics and structured data are responsible for providing a deterministic virtual - world framework that complies with physical laws; generative AI and probabilistic computing are used to fill this "framework" and add realistic details and dynamic changes on the original basis.

Jensen Huang said that the integration of these two concepts will make the produced content beautiful, immersive, and controllable. However, to promote this new paradigm to industries other than gaming, such as finance, healthcare, and manufacturing, the problems of massive and heterogeneous data need to be solved first.

Jensen Huang also shared his views on current data in his speech.

Currently, most of the data in the world exists in the forms of databases, PDFs, audio, and video. Nine - tenths of them are unstructured data. Before the emergence of multi - modal perception - and - understanding technology, it was difficult to query and retrieve data efficiently.

For this reason, NVIDIA launched two brand - new data tools, which Jensen Huang called the most complex data - processing system in the world:

cuDF —— It processes deterministic structured data, such as tables and logs. (Corresponding to the "3D graphics and structured data" part in DLSS 5 mentioned above)

cuVS —— It processes probabilistic unstructured data, such as converting text, images, audio, and video data into semantic vectors through AI models. (Corresponding to the "generative AI and probabilistic computing" part in DLSS 5 mentioned above)

Currently, these two data platforms have been integrated into cloud - service and OEM systems such as IBM, Dell, and Google Cloud.

Jensen Huang's two - hour speech indicated a trend. When the AI competition shifts from the "model race" to the "productivity race", people will not scramble for GPUs and computing power as they did a few years ago, but will compete for the right to dominate token production.

Specifically, NVIDIA provides a productivity suite that includes underlying chips (Rubin/Feynman), heterogeneous architectures (GPU + LPU + CPU), system design (full - liquid - cooling cabinets), upper - layer operating systems (OpenClaw, NemoClaw), and tools (DLSS5, cuDF, cuVS). It hopes to make the token - production process as efficient and batch - oriented as the manufacturing industry.

This article is originally produced by「邱晓芬」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

When Jensen Huang and Wu Yongming shouted out the same English word

Trillion - dollar Narrative: From Data Centers to Token Factories

NVIDIA is No Longer Just a GPU Chip Company

NVIDIA's Version of OpenClaw is Here

The "GPT Moment" in Graphics