Erste echte Erfahrungsberichte: Muss Jensens Huang's 30.000 - Yuan - persönlicher Supercomputer an Elon Musk an einem Mac Studio bedient werden, um reibungslos zu laufen?

Ist es eine "IQ-Steuer", wenn man 30.000 Yuan für einen Superkauf ausgibt?

200 billion parameters, 30,000 yuan, 128 GB of RAM. Can this machine, known as the "world's smallest supercomputer", really run large models on our desks?

Recently, Jensen Huang officially handed over this supercomputer to Elon Musk and then personally visited OpenAI's headquarters to give it to Altman. From its premiere at CES to today's market launch, this personal supercomputer is finally in our hands.

Sales situation on the official website: The price is $3,999, and versions from seven computer manufacturers such as Asus, Lenovo, and Dell are also available; Link: https://marketplace.nvidia.com/en-us/developer/dgx-spark/

The NVIDIA DGX Spark is a personal AI supercomputer targeting researchers, data scientists, and students. It provides them with powerful desktop AI computing capabilities to facilitate the development and innovation of AI models.

It sounds very impressive, but the applications that come to an average user's mind are still basically:

Run large models locally: Chat contents stay only on one's own computer, thus being absolutely secure.
Create locally: Generate images and videos without restrictions and get rid of memberships and points.
Create a personal assistant: Feed all one's own data into it and train a "Jarvis" that only understands you.

On some graphics card rental platforms, the price for an A100 is listed as 7 yuan per hour.

Actually, the performance of the DXG Spark GB10 Grace Blackwell Superchip could expand its application areas, but what can it specifically do? And how good is its performance? With a price of 30,000 yuan, one can rent an A100 for 4,000 hours. Would you really place it on your desk to run large models?

We have collected several detailed tests of the DGX Spark from the Internet to show whether this device is really worth 30,000 yuan before our own hands - on experience.

Too long to read:

1. Performance positioning: Shows excellent performance in light - weight models and can also stably run a large model with 120 billion parameters. The overall performance is between the RTX 5070 and the RTX 5070 Ti.

2. Biggest obstacle: The memory bandwidth of 273 GB/s is the limiting factor. The computing power is sufficient, but the data transfer is slow. The experience is like a person who thinks very fast but stutters when speaking.

3. Unconventional application: Use a Mac Studio M3 Ultra to support it. The DGX Spark is responsible for fast thinking, while the Mac Studio ensures smooth communication to solve the "stuttering" problem.

4. Rich ecosystem: The official website offers over 20 ready - to - use applications, from video generation to building a multi - agent assistant. The entire AI suite is ready for you.

Only a little better than a Mac Mini?

Without further ado, let's first look at the data.

The average number of tokens processed and decoded per second. The DGX Spark lags behind the RTX 5080. The image was created by ChatGPT.

The DGX Spark is significantly better than the Mac Mini M4 Pro, especially in the prefill phase. However, the advantage is not so obvious in the decode phase. The Mac Mini M4 Pro can achieve an average TPS of 17.8 in the DeepSeek R1 open - source model, while the DGX Spark only reaches 33.1.

Let's quickly explain the terms to understand what the two phases of AI inference are

Simply put, the process in which the model generates an answer after we enter a question into the AI chat window can be divided into two key phases:

1. Prefill (Pre - filling/Text understanding phase)

After the AI receives our question, it quickly reads and understands each word we entered (i.e., the prompt).

The faster this phase is processed, the shorter the waiting time until the AI outputs the first word. This is the commonly used indicator for AI performance, the Time To First Token (TTFT).

Apple uses the response speed of the first token to promote the performance of the M5 chip.

2. Decode (Decoding/Answer generation phase)

It's as if the AI has already thought of the answer and starts to output it word by word.

This determines the AI's typing speed, that is, the commonly used TPS (Tokens per Second). The higher this value is, the faster the complete answer will be displayed.

💡 Tips: What is TPS?

TPS is the abbreviation for Token Per Second and can be understood as the work performance or typing speed of the AI.

The TPS in the prefill phase: Represents the speed at which the AI understands the question.

The TPS in the decode phase: Represents the speed at which the AI generates the answer for us.

So when the DGX Spark gives us an answer, the first word appears very quickly, but its subsequent typing speed is very slow. Remember that the Mac Mini M4 Pro only costs 10,999 yuan, and this is the version with 24 GB of unified RAM.

Why is this so? This test was conducted by the team of the large - model arena LMSYS. They selected six different devices in their SGLang project and Ollama and ran several open - source large - language models.

SGLang is a high - performance inference framework developed by the LMSYS team. FP8, MXFP4, q4_K_M, and q8_0 refer to the quantization formats of large - language models, that is, the compression of large models and the use of different binary storages.

The test project included both a local large model with 120 billion parameters and a smaller model with 8 billion parameters. In addition, the batch size and the differences between the two frameworks SGLang and Ollama have different impacts on the performance of the DGX Spark.

For example, the test team mentioned that the DGX Spark can only decode 20 tokens per second with a batch size of 1, but when the batch size is set to 32, the number of decoded tokens increases to 370 per second.

Generally speaking, the larger the batch size, the more content needs to be processed each time, and the higher the requirements for GPU performance.

The AI performance of the DGX Spark is positioned between the RTX 5070 and the RTX 5070 Ti due to the architecture of the GB10 Grace Blackwell chip it uses and the performance of 1 PFLOP in sparse FP4 tensors.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Muss der persönliche Supercomputer im Wert von 30.000 Yuan, den Jensen Huang an Elon Musk geschenkt hat, sich an einem Mac Studio bedienen, um reibungslos zu laufen? Die ersten echten Erfahrungsberichte liegen vor.

Too long to read:

Only a little better than a Mac Mini?

Let's quickly explain the terms to understand what the two phases of AI inference are