HomeArticle

Nvidia open-sources a new large model: Jensen Huang doesn't want to just "sell shovels"

新智元2025-12-17 09:52
NVIDIA released Nemotron 3. Shifting from selling shovels to mining itself, it has initiated a new battlefront in the AI arena.

By the end of 2025, the battle in the AI field remains highly uncertain. NVIDIA, the company that has been "selling shovels," now seems eager to enter the "mining" business itself. On December 15th, NVIDIA announced the Nemotron 3 family (Nano/Super/Ultra). The Nano version is already available, while the Super and Ultra versions are planned for release in the first half of 2026.

For a long time, the global AI industry has adhered to a clear - cut division of labor: NVIDIA and the rest.

The "rest" includes companies like OpenAI, Meta, Google, DeepSeek, xAI, and so on.

The principle of this division is straightforward: the shovel - sellers and the shovel - users.

Recently, Google, with its TPU, has gained the ability to compete with NVIDIA. However, it is still difficult to shake NVIDIA's dominance in the short term.

As long as there are "gold mines" in the AI field, the shovel - sellers will always make a profit, regardless of who strikes gold.

This business model has propelled NVIDIA's market value to new heights, making it one of the most profitable technology companies globally.

However, by the end of 2025, NVIDIA seems no longer content with its role as a mere hardware provider. It wants to enter the "mining" business directly.

NVIDIA officially launched a brand - new open - source model family, Nemotron 3.

This is not just a routine product update but rather a well - planned strategic surprise.

NVIDIA is no longer satisfied with merely providing the hardware foundation. It has entered the arena directly, and with a revolutionary "ace in the hole":

Mamba architecture, MoE (Mixture of Experts), hybrid architecture, and a 1 - million - token context window.

The Nemotron 3 series of open - source models includes three specifications: Nano, Super, and Ultra.

Is Nemotron 3 just a simple imitation of OpenAI or Meta's open - source models? Or is it just a side project by Jensen Huang?

Dissecting Nemotron 3: A Frankenstein or an Ultimate Evolution?

In the AI arena, architecture determines destiny.

In the past few years, the Transformer architecture has dominated the field. It is the soul of ChatGPT, the cornerstone of Llama, and the underlying structure of all large - scale models.

However, as the model parameters increase and the application scenarios become more complex, the limitations of the Transformer architecture have become increasingly apparent: high inference costs, large memory consumption, and low efficiency when processing extremely long texts.

The Nemotron 3 family introduced by NVIDIA is not a simple Transformer model but rather a "hybrid prince" that combines the best of multiple architectures.

It boldly integrates three top - notch technologies: Mamba (State Space Model), Transformer (Attention Mechanism), and MoE (Mixture of Experts).

Among them, the Nemotron 3 Nano, with its breakthrough Mixture of Experts architecture, has a throughput four times higher than that of the Nemotron 2 Nano.

Nemotron achieves excellent accuracy through advanced reinforcement learning technology and large - scale concurrent multi - environment post - training.

NVIDIA has taken the lead in releasing a set of state - of - the - art open - source models, training datasets, and reinforcement learning environments and libraries for building high - precision and high - efficiency dedicated AI agents.

Family Lineage: More Than Just "Large, Medium, and Small"

Nemotron 3 is not a single model but a comprehensive family matrix designed to meet the requirements of all scenarios, from edge devices to cloud - based supercomputers.

According to NVIDIA's plan, this family mainly consists of three members, each with a different strategic mission:

Nemotron 3 Nano (already released): The "Special Forces" on the Edge

Parameter Scale: The total number of parameters is 30B (30 billion), but only about 3B (3 billion) parameters are activated during inference.

Core Positioning: It is the vanguard of the family, focusing on efficient inference and edge computing. It can run smoothly on consumer - grade graphics cards and even high - end laptops.

Technical Highlights: It is currently the most powerful "pocket rocket" on the market. Using a hybrid architecture, it achieves extreme throughput and is specifically designed for Agent tasks that require quick responses.

Strategic Significance: The existence of Nano is to prove the feasibility of the "hybrid architecture" and quickly capture the market of developers' desktops and edge devices.

Nemotron 3 Super (expected in the first half of 2026)

Parameter Scale: Approximately 100B (100 billion), with about 10B activated parameters.

Core Positioning: It serves as the central hub for enterprise - level applications and multi - agent collaboration. It needs to find the perfect balance between performance and cost.

Technical Upgrade: It is expected to introduce a more advanced Latent MoE technology, specifically designed for complex enterprise workflows.

Nemotron 3 Ultra (expected in the first half of 2026): Challenging GPT - 5

Parameter Scale: Approximately 500B (500 billion), with about 50B activated parameters.

Core Positioning: It is the flagship of the family, capable of handling the most complex inference, scientific research, and in - depth planning tasks.

Ambition: It directly competes with closed - source models at the GPT - 5 level, aiming to become the ceiling of inference in the open - source community. It will demonstrate NVIDIA's training capabilities on ultra - large - scale clusters.

Nemotron 3 Nano is not just a model but also a technology verification platform, proving that "Mamba + MoE" can unleash astonishing combat power even with a small number of parameters.

Mamba Architecture: Declaring War on Transformer's "Memory Killer"

To understand the revolutionary nature of Nemotron 3, we first need to talk about Mamba.

Why does NVIDIA introduce this relatively "niche" architecture into a mainstream model?

In the world of LLM (Large Language Models), the Transformer is the absolute hegemon, but it has a fatal weakness:

As the input sequence becomes longer, its computational complexity and memory consumption increase quadratically

.

Imagine reading a book.

If you were a Transformer, reading the first page would be easy. But when you reach the 1000th page, to understand the current sentence, you would have to mentally review the relationship between every word on the previous 999 pages and the current word (attention mechanism).

This requires a huge amount of "mental capacity" (memory). When the context reaches 100,000 or 1,000,000 words, any existing GPU will be overwhelmed.

Mamba is different. Based on SSM (State Space Models), it is essentially a recurrent neural network with excellent short - term memory.

Its way of reading is more like a human's: the previously read content is "digested" into a fixed - size memory state, eliminating the need to constantly look back at every word.

Paper link: https://arxiv.org/pdf/2312.00752

The core advantages of Mamba are as follows:

Linear Complexity (O(N)):

Regardless of the length of the book, Mamba's inference consumption remains almost constant. Reading 10,000 words or 1,000,000 words puts almost the same pressure on the memory.

Extremely Fast Inference Speed:

Since there is no need to calculate a large KVCache (key - value cache) attention matrix, Mamba has a very high generation speed (throughput).

Potential for Infinite Context:

Theoretically, Mamba can handle extremely long sequences without overwhelming the memory.

However, Mamba also has its limitations.

When dealing with extremely complex logical reasoning or tasks that require precise "looking back" to locate a specific information point (Copying/Recall tasks), it is not as accurate as the Transformer's Attention mechanism.

This is because some information is inevitably lost when it is compressed into the "state".

NVIDIA's solution: Adults don't make choices; they take it all.

Nemotron 3 adopts a Hybrid Mamba - Transformer architecture.

This is a smart design:

<