HomeArticle

Jensen Huang, Initiating the Token Era

新智元2026-03-20 16:38
The annualized Token usage of global large models has exceeded one quadrillion, and AI has entered the era of Token economics.

[Introduction] Today, the global large models have entered the era of quadrillion Tokens! On just the OpenRouter platform alone, roughly calculated at about $1 per million Tokens, the corresponding annualized inference expenditure is about $1 billion! The "Token Economics" proposed by Huang (Jensen Huang) at the GTC conference has been trending across the internet in recent days. Token has become the new metric in the AI era!

In March 2026, a figure that was enough to keep the entire Silicon Valley and Wall Street awake at night was born -

Just now, the global large models have entered the era of quadrillion Tokens!

OpenRouter announced that the annualized Token usage has exceeded one quadrillion.

One quadrillion is not an observational data in astrophysics, nor is it the GDP of a certain country. It is just the annualized Token throughput of an AI model aggregation platform.

If you have no concept of this figure, we can use another calculation method:

Calculated at the current average market price of about $1 per million Tokens, the inference expenditure behind just one aggregator is as high as $1 billion.

Just one platform has generated $1 billion in real money. The global computing power cost has entered a new stage.

Yes, Token is becoming the oil of this era.

Now, AI has penetrated into every line of code and every email through API interfaces, just like electricity, affecting almost everyone in the world.

If 2023 was the first year of model awakening, then today in 2026, we are standing in front of the figure of 1,000,000,000,000,000.

This is the largest - scale intellectual overflow in the history of human civilization.

While people are still discussing whether AI is a bubble, these 15 zeros have swept away all the skeptics like a tsunami!

Just two days before this figure was announced, Huang stood on the stage of GTC 2026 and mentioned the word "Token" more than 70 times in a nearly two - hour speech.

Token became the anchor point and the main line running through the entire speech.

Huang was actually naming the economic foundation for a new era. He gave it a name: Token Economics.

Within just a few days, the concepts of "Token Engineering" and "Token Economics" immediately became a hit across the internet!

NVIDIA Created Token Economics

What exactly is Token?

In Huang's view, Token is no longer just a technical term. It has become a unit of computing power, a unit of information, and a unit of currency.

Token has become a big business. Huang's judgment directly reveals the underlying logic of the AI industry -

Token is the core economic yardstick in the AI era!

On this basis, the operating rules of the global AI industry may be reshaped.

Token is a unit of information and the smallest unit of AI thinking.

Schematic diagram of Token

From a simple chat question - answer session, to the generation of a movie - level AI video, and then to the training and inference of enterprise - level models, all the information processed by AI is measured in Tokens.

It is the atom of AI "thinking" and the most basic metric of the intelligent economy.

Unit of Computing Power, a New Product in Data Centers

At GTC, Huang officially proposed the concept of the "Token Factory":

In the future, data centers will no longer store data or run software, but produce Tokens.

How many Tokens you can produce determines how much money you can earn.

However, power is a hard constraint. A 1 - gigawatt computer room is just 1 gigawatt, and the laws of physics are unforgiving.

So, the core of the current competition has become who can achieve the highest Token throughput per watt and the lowest production cost with the same electricity bill.

This is actually the same as traditional manufacturing - on the same production line, the one with a higher yield rate wins. It's just that the "product" has changed from chips to Tokens.

Huang announced the hierarchical pricing of Tokens on stage:

The low - end is about $1 per million Tokens, the mid - end is $3 to $6, the high - end engineering level is $45, and the real - time interactive level is $150 per million Tokens.

The price span is 150 times, and speed and quality determine the price.

The evolution curve shown by Huang on stage has a clear idea: use extreme software - hardware collaboration to crush the cost generation after generation.

The Grace Blackwell increases the throughput by 35 times compared to Hopper at each price level. The brand - new Vera Rubin doubles it again on the basis of Blackwell, and after the first integration of the Groq LPU, it soars by another 35 times at the super - level.

In just two years, the Token generation rate has soared from 2 million to 700 million, a heroic leap of 350 times.

And when the cost is reduced by an order of magnitude, the consumption will explode by another order of magnitude.

Unit of Currency, a New Form of Compensation Written into Pay Stubs

This is the most explosive part and the part that really detonated public opinion during Huang's GTC speech.

Who would have thought that when ChatGPT Pro launched a $200 - per - month membership at the end of 2024, people were still watching and wondering "which fool would spend so much money subscribing to AI".

Today, the rate of burning Tokens has evolved from "painful" to "alarming".

Huang announced on stage:

In the future, each NVIDIA engineer will need an annual Token budget.

With a base salary of hundreds of thousands of dollars, I will give Tokens worth about half of that on top of it to amplify efficiency by 10 times.

Moreover, the Token budget will also become a new recruitment chip in Silicon Valley. During interviews, engineers will ask: How many Tokens does my offer come with?

In the past, when job - hopping, people looked at equity and RSUs. Now, they also have to see if the company allocates Tokens. The Token budget is changing from an IT expense to an HR expense.

The biggest variable driving all this is Agent.

The OpenClaw, which suddenly became popular in January this year, has boosted the entire intelligent agent track.

An OpenClaw equipped with Claude Opus 4.6 can burn hundreds of RMB in just a few rounds of conversation.

But Huang sees the other side.

Agents need to burn Tokens to run, burning Tokens requires buying computing power, and buying computing power requires a budget.

So, the Token budget appears in the corporate financial statements.

Every SaaS company will become an AaaS company - Agentic as a Service.

Inference service providers expand production capacity, cloud providers build Token factories, SaaS companies transform into Agent services, and enterprises allocate Token budgets to employees. From production to consumption, the closed - loop is complete.

Unit of information, unit of computing power, unit of currency - three in one.

Token is no longer just a technical parameter, but the core economic yardstick in the AI era.

A computing power center producing Tokens is equivalent to printing money.

The cloud providers' promotion of the "lobster - raising" model is also a Token business behind it.

Three Supercomputers, Betting on Three Eras

To produce Tokens, you need a factory. To consume Tokens, you need Agents.

But if the factories only exist in data centers and Agents only run in the cloud, this economics will always be a game for big companies.

Huang wants to bring it to every desktop. His method is simple - deliver it personally.

In 2016, the first DGX - 1 was given to OpenAI under Elon Musk's leadership.

Deep learning had just emerged from the laboratory, and most people were still on the sidelines.

Huang's bet: AI has a future.

In 2024, the first DGX H200 was given to OpenAI under Sam Altman's leadership.

ChatGPT swept the world, and the Scaling Law was in the spotlight. Everyone was competing in terms of parameters and scale.

Huang's bet: The era of large - scale infrastructure for training has arrived, and AGI is booming.

On March 18, 2026, just two days after the GTC keynote speech.

Huang carried the world's first DGX Station GB300 and knocked on the door of the laboratory.

This time, the recipient was Andrej Karpathy, the proposer of Vibe Coding, the leader of Agentic Engineering, and the most concrete consumer of Token economy.

20 petaflops of computing power. 784GB of memory. A model with trillions of parameters can take off directly on the desktop.

This machine requires a 20 - amp current and is a Token factory placed on the table.

Karpathy took it and immediately posted a group photo on X.

This machine is simply amazing!

They said there was a mysterious gift and quietly revealed that it needed to be plugged into a 20 - amp power supply.

So I guessed right away that it must be very powerful.

Karpathy only uses this powerful desktop supercomputer for one thing - raising lobsters.

He announced on the spot that the first task of this monster is to run his OpenClaw intelligent agent "Dobby the House Elf claw".

Yes, Dobby