Get NVIDIA AI Supercomputer for $3999: Deploy All Large-Parameter Open-Source Models "in Your Palm"

Huang Renxuan personally handed it to Elon Musk. It has 128GB of unified memory and weighs only 1.18kg.

NVIDIA's personal AI supercomputer, DGX Spark, is now on the market! It comes with 128GB of unified memory (regular system memory + GPU video memory). Moreover, it allows two DGX Sparks to be connected, enabling the direct operation of a 405B large model (FP4 precision), which is approaching the largest currently open - source model! Despite its terrifying capabilities, it is surprisingly quiet and elegant, similar in size to a Mac mini, and can be taken home for just $3,999!

For just $3,999, take home a mini AI supercomputer host!

NVIDIA's DGX Spark, only the size of Jensen Huang's palm, is officially on sale!

NVIDIA will start selling it on Wednesday, October 15, through Nvidia.com and third - party retailers.

This is not a desktop computer for ordinary consumers but a mini PC for AI developers, and it's finally ready to hit the market!

Let's first take a look at the specific parameters:

It is equipped with NVIDIA's GB10 Grace Blackwell super - chip.

This mini - computer, similar in size to a Mac mini, weighs 2.6 pounds (approximately 1.18 kg).

It offers 1 PFLOPS of FP4 AI performance.

It has 128GB of consistent unified system memory.

It comes with a ConnectX - 7 intelligent network card.

It can support up to 4TB of storage.

Its dimensions are 150mm (length) x 150mm (width) x 50.5mm (height).

It is more like a computer for AI training rather than a general - purpose computer.

DGX Spark runs NVIDIA's DGX OS, a customized version of Ubuntu Linux, instead of Windows, and it is pre - configured with AI software.

Jensen Huang Fulfills His GTC Promise

The Era of Personal AI Supercomputing Has Arrived

At NVIDIA's GTC conference in March this year, Jensen Huang simultaneously launched two personal AI supercomputers, the DGX Spark and the DGX Station.

The Spark previously appeared under the name "Digits" and is the "world's smallest AI supercomputer" comparable in size to a Mac mini!

The price of the larger Station model has not been announced yet. It is mainly targeted at "AI developers, researchers, data scientists, and students for prototyping, fine - tuning, and reasoning of large models on the desktop."

To celebrate the global delivery of the DGX Spark, Jensen Huang traveled to the Starship base in Texas and personally delivered the first batch of DGX Sparks to Elon Musk, the chief engineer of SpaceX.

From the picture, we can also see that Musk wrote a note to Jensen Huang by hand.

From a single spark, A world of intelligence!

To Jensen, Ad astra!

From a single spark, ignite a world of intelligence!

To Jensen Huang, reach for the stars!

The handwriting of "J. H." in the middle is Jensen Huang's autograph.

Among them, "ad astra" is a Latin phrase meaning "to the stars" or "towards the stars," often used to express the spirit of exploration and the pursuit of excellence.

Back in 2016, Jensen Huang delivered the first AI - optimized GPU to Elon Musk.

Nearly 10 years later, in 2025, Jensen Huang showed Musk the world's smallest AI supercomputer.

It's obvious that Jensen Huang has a true affection for Musk! (I wonder what Altman thinks about this!)

Netizens also joked that after Musk received the Spark, he took out a "MacroHard" from the box to give to Jensen Huang, which is really full of implications.

In - Depth Review of NVIDIA's DGX Spark

A New Benchmark for Desktop AI Supercomputing

NVIDIA rarely condenses super - computing - level performance into a desktop workstation. The DGX Spark is a groundbreaking attempt.

It brings the computing power of a data center to the desktop, allowing developers and researchers to have a personal AI supercomputer with quadrillions of operations per second on their desks.

In the past year, the SGLang inference framework has gained popularity in the data - center field due to its excellent performance: this framework not only achieved excellent scores in the inference community but also successfully deployed complex models such as DeepSeek on large - scale clusters. It used technologies like Prefill - Decode decoupling (PD) and Expert Parallelism (EP) to push large - scale inference performance and developer efficiency to new heights.

The emergence of the DGX Spark provides an opportunity for SGLang to move from the data - center market to the personal - developer market, bringing its mature inference framework directly to more developers and researchers.

Appearance Design

The DGX Spark features a champagne - gold all - metal casing. The front and rear panels are made of porous metal foam, which has a unique texture and aids in heat dissipation.

This design is reminiscent of the larger DGX A100 and H100 servers.

The overall shape of the machine is small and unique. As shown in the picture above, its size is similar to that of an Apple desktop host, but it has powerful computing capabilities.

Looking at the back, the DGX Spark offers an amazing array of interfaces: there is a power button, four USB - C interfaces (the left - most one supports up to 240W of power supply), an HDMI video output, a 10GbE RJ - 45 Gigabit Ethernet port, and two sets of QSFP network ports (driven by NVIDIA's ConnectX - 7 network card, providing a total bandwidth of 200Gb/s).

Such a comprehensive set of high - speed interfaces even allows two Sparks to be directly connected to form a small two - node cluster to run larger AI models.

It's worth noting that the Spark uses a USB - C interface for power supply, which is unheard of in desktop computers. Common high - performance hosts on the market (such as Mac mini or Mac Studio) usually use a traditional three - prong power cord for stable power supply. NVIDIA's bold choice to use USB - C is probably to externalize the power adapter, thus freeing up valuable internal space for a large - scale heat - dissipation module.

This design results in an extremely compact body, but the drawback is that the power cord has no snap - on fixation and is relatively easy to be accidentally pulled out. Therefore, extra care is needed during daily use to avoid accidental contact.

Hardware Configuration

The small DGX Spark packs amazing hardware performance.

At its core is an NVIDIA - customized GB10 Grace Blackwell super - chip, which integrates 10 high - performance Cortex - X925 cores and 10 high - efficiency Cortex - A725 cores, totaling 20 cores.

This chip provides both general - purpose computing capabilities and a built - in powerful GPU module.

In terms of AI computing, the Blackwell GPU of the GB10 chip can reach a computing power of 1PFLOPS (one quadrillion floating - point operations per second) at sparse FP4 precision. Its AI inference ability is roughly between that of the desktop - version RTX 5070 and 5070Ti graphics cards.

The biggest highlight of the DGX Spark is that it is equipped with 128GB of consistent unified memory, which means that the CPU and GPU share the same physical memory space and can seamlessly access each other's data.

Under this unified architecture, the Spark can directly load and run ultra - large - scale models without copying data between the system memory and the video memory, greatly reducing the overhead of data transfer.

We can load a model with tens of billions of parameters into this 128GB of memory at once and run it.

Even more impressively, the Spark is expandable: with the dual QSFP network interfaces (200Gb/s bandwidth) on the back, two Sparks can be directly connected to form a two - machine cluster.

According to NVIDIA, two connected Sparks can handle models with up to 405B parameters (FP4 precision), which is approaching the largest currently open - source large model.

The DGX Spark condenses the data - center - level combination of "large memory + high - speed interconnection + top - tier GPU" into a machine weighing less than 2 kilograms, which is truly an engineering marvel.

Of course, every coin has two sides. The bandwidth of the Spark's unified memory is relatively limited: the total bandwidth of the LPDDR5x memory used is about 273GB/s.

For a GPU, this figure is far lower than the bandwidth of the independent video memory of a professional graphics card (for example, the video - memory bandwidth of the data - center - level H100 GPU is close to 3TB/s). Therefore, it becomes the main bottleneck for the Spark during heavy - load AI inference.

Nevertheless, the 128GB capacity still gives the Spark a unique advantage. After all, the video - memory capacity of most desktop systems is far from reaching this level. Many huge models simply "won't fit" on other devices, but the Spark makes it possible.

Performance

Comprehensive performance tests have also been released.

Full review results: https://docs.google.com/spreadsheets/d/1SF1u0J2vJ - ou - R_Ry1JZQ0iscOZL8UKHpdVFr85tNLU/edit?pli = 1&gid = 0#gid = 0

The review used two inference frameworks, SGLang and Ollama, to run a series of open - source large - language models on the Spark and compared its performance with other devices.

The Spark can indeed load and run extremely large models such as GPT - OSS 120B and Llama 3.1 70B. However, such workloads on the Spark are more suitable for prototype verification and experimental exploration rather than production environments that require high throughput.

The DGX Spark truly shines in the inference of small - to - medium - scale models. Especially when using batch - processing parallelism, its throughput efficiency can be pushed to extremely high levels.

In the test of the GPT - OSS 20B model (using the Ollama framework), the Spark's pre - fill throughput is about 2053 tokens/s, and the single - token decoding speed is about 49.7 tokens/s.

In contrast, the NVIDIA RTX Pro

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。