First Real Experiences: Can the $30,000 Personal Supercomputer Given by Jensen Huang to Elon Musk Only Run Smoothly with a Mac Studio?

Is it a waste of money to spend 30,000 yuan on a Super?

With 200 billion parameters, a price of 30,000 RMB, and 128GB of memory, can this machine, known as the "world's smallest supercomputer", really allow us to run large models on our desktops?

Some time ago, Jensen Huang officially delivered this supercomputer to Elon Musk. Later, he also went to the OpenAI headquarters in person and gave one to Sam Altman. From its debut at CES to its current launch, this personal supercomputer is finally coming to us.

Regarding the official website sales, it is priced at $3,999. It also offers versions sold by seven computer brands such as ASUS, Lenovo, and Dell; Link: https://marketplace.nvidia.com/en-us/developer/dgx-spark/

The NVIDIA DGX Spark is a personal AI supercomputer. Its target users are researchers, data scientists, students, etc. It provides them with high-performance desktop-level AI computing power to help them complete the development and innovation of AI models.

It sounds very powerful, but the common ways of using it that ordinary people can think of are nothing more than:

Run large models locally: The content of the chat with it only stays on your own computer, ensuring absolute security.
Create locally: Generate pictures and videos without restrictions, saying goodbye to memberships and points.
Build a personal assistant: Feed all your own data to it and train a "Jarvis" that only understands you.

The price of the A100 shown on some graphics card rental platforms is $7 per hour.

Actually, the capabilities of the DXG Spark GB10 Grace Blackwell superchip may expand its application scenarios. But specifically, what can it do? And how well does it perform? With a price of 30,000 RMB, you can rent an A100 for 4,000 hours. Would you really put it on your desk to run large models?

We have collected multiple detailed reviews of the DGX Spark currently available on the Internet. Before our actual experience, we attempt to show you whether this device is really worth 30,000 RMB.

TL;DR:

1. Performance positioning: It performs excellently with lightweight models and can also stably run large models with 120 billion parameters. Overall, its performance is between the RTX 5070 and the RTX 5070 Ti.

2. Biggest shortcoming: The 273 GB/s memory bandwidth is the limitation. The computing power is sufficient, but the data transmission is slow. The experience is like that of a person with a quick mind but a stutter.

3. Unconventional usage: Use a Mac Studio M3 Ultra to "assist" it. The DGX Spark is responsible for fast thinking, and the Mac Studio is responsible for smooth expression, forcibly solving the "stuttering" problem.

4. Rich ecosystem: The official provides more than 20 out-of-the-box usage scenarios, from video generation to building multi - agent assistants. It comes with a complete AI package.

Only a little better than the Mac Mini?

Without further ado, let's first look at the data.

The average number of tokens processed and decoded per second. The DGX Spark ranks behind the RTX 5080. The picture is made by ChatGPT.

The DGX Spark is much stronger than the Mac Mini M4 Pro, especially in the Prefill stage. However, in the Decode stage, the advantage is not so obvious. The average TPS of the Mac Mini M4 Pro on the DeepSeek R1 open - source model can reach 17.8, while that of the DGX Spark is only 33.1.

Let's quickly explain some terms to understand what the two stages of AI inference are

Simply put, when we enter a question in the AI chat box, the process of the model generating an answer can be divided into two key steps:

1. Prefill (Pre - filling/Reading and Comprehension stage)

After the AI gets our question, it quickly reads and understands every word (i.e., the prompt) you input.

The faster this stage is processed, the shorter the time we have to wait for the AI to output the first word. That is, the shorter the Time To First Token (TTFT), which is often used as an indicator to promote AI capabilities.

Apple promotes the capabilities of the M5 chip using the first token response speed.

2. Decode (Decoding/Answer Generation stage)

It's like the AI has already thought of the answer and starts typing it out word by word for us.

It determines the typing speed of the AI, which is what we often call TPS (Tokens Per Second). The higher this value, the faster we can see the complete answer displayed.

💡 Tips: What is TPS?

TPS is the abbreviation of Tokens Per Second. It can be understood as the work efficiency or typing speed of the AI.

TPS in the Prefill stage: Represents the speed at which the AI understands the question.

TPS in the Decode stage: Represents the speed at which the AI generates an answer for us.

So when the DGX Spark answers us, the first word comes out quickly, but its subsequent typing speed is very slow. You know, the price of the Mac Mini M4 Pro is only 10,999 RMB for the version with 24GB of unified memory.

Why is this the case? This test was carried out by the LMSYS team of the large - model arena. They selected six different devices in their SGLang project and Ollama and ran multiple open - source large language models to complete the test.

SGLang is a high - performance inference framework developed by the LMSYS team. FP8, MXFP4, q4_K_M, and q8_0 refer to the quantization formats of large language models, which means compressing large models and using different binary storage methods.

The test items include local large models with 120 billion parameters and smaller models with 8 billion parameters. In addition, the batch size and the differences between the SGLang and Ollama frameworks will all have different impacts on the performance of the DGX Spark.

For example, the evaluation team mentioned that when the batch size of the DGX Spark is set to 1, the number of tokens decoded per second is only 20. However, when the batch size is set to 32, the number of tokens decoded per second rises to 370.

Generally speaking, the larger the batch size, the more content needs to be processed each time, and the higher the performance requirements for the GPU.

The AI capabilities of the DGX Spark, based on its GB10 Grace Blackwell chip architecture and the performance of 1 PFLOP of sparse FP4 tensors, are positioned between the RTX 5070 and the RTX 5070 Ti.

So the picture showing the results at the beginning actually cannot comprehensively display the capabilities of the DGX Spark because it averages the results of all model tests. However, the final performance it shows will vary with model inferences of different batch sizes and models with different parameters.

Overall, the advantages of the DGX Spark are:

Strong computing power: It can handle large - scale tasks, and its core AI capabilities are at the level of the RTX 5070.
Large memory: With 128GB of massive memory, it can easily run large models with hundreds of billions of parameters.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Can the personal supercomputer worth $30,000 that Jensen Huang gave to Elon Musk only run smoothly with the help of a Mac Studio? The first real experiences are here.

TL;DR:

Only a little better than the Mac Mini?

Let's quickly explain some terms to understand what the two stages of AI inference are