NVIDIA becomes the benchmark for open-source large models in the United States: Even the training formula of Nemotron 3 is made public, and all 10 trillion tokens of data are released.
NVIDIA is very aggressive in the field of open-source models:
The "most efficient open model family", Nemotron 3, incorporates the hybrid Mamba-Transformer MoE architecture and NVFP4 low-precision training.
Moreover, it is completely open:
Not only are the model weights open, but also the training data of over 10 trillion tokens, pre-training and post-training software, and training recipes are all made public.
Compared with other open-source models, it has competitive performance and is 1.5 - 3.3 times faster.
Combining Mamba and Transformer
Nemotron 3 aims to maximize inference efficiency at the architectural level.
The self-attention mechanism of traditional Transformer requires linear scanning of the ever-growing KV Cache. The longer the sequence, the greater the computational overhead.
NVIDIA's solution is to extensively use Mamba-2 layers to replace self-attention layers. The Mamba layer only needs to store a fixed-size state during generation, which is not affected by the sequence length.