HomeArticle

Experience NVIDIA's AI personal supercomputer "nuclear bomb" DGX Spark. Can you fine-tune DeepSeek R2?

爱范儿2025-12-31 12:22
From now on, you can freely generate AI images and videos.

Recently, APPSO finally got this personal supercomputer highly recommended by Jensen Huang, the NVIDIA DGX Spark. The first impression upon getting it was that it was "small yet beautiful."

This computer is incredibly small. It's not as bulky as the Mac Studio and is probably about the same size as the Mac Mini. The shiny silver finish and the metal mesh for heat dissipation make it a bit unique, with a distinct hardcore aesthetic.

Previously, in our comprehensive review summary of the DGX Spark across the web, we mentioned some parameters of this computer, such as a weight of 1.2kg and dimensions of 5.05*15*15cm.

Weight: Mac Studio M4 Max 2.74kg; Mac Mini M4 0.67kg

Dimensions: Mac Studio: 9.5*19.7*19.7; Mac Mini: 5.0*12.7*12.7

As for its computing power, it has 128GB of unified GPU+CPU memory, a GB10 Grace Blackwell supercomputing chip with performance comparable to that of the RTX 5070/5070 Ti, and a LPDDR 5X memory bandwidth of 273 GB/s.

To be honest, running tasks locally is probably the only thing I can think of for a machine with 128GB of memory and the computing power of a GB10 graphics card.

I can safely entrust all content that may involve privacy, whether it's a PDF document, pictures, text, or even videos, to this computer. I can turn off the Wi-Fi, open the deployed project, load the downloaded open-source models, and it can handle everything locally.

But does local processing really matter? For a while, ChatGPT was almost like my diary, and I would tell it everything. For ordinary consumers, being online or offline doesn't seem to be a particularly appealing selling point.

After actually using it, the price of 30,000 yuan doesn't seem too bad. The Linux Ubuntu operating system isn't too difficult to use either. Although the often-criticized bandwidth speed does have an impact when watching the responses being generated word by word, can the powerful computing power and ample memory really make up for the waiting time?

Currently, this computer can also be found on e-commerce platforms like JD.com. We can bring this personal supercomputer home for around 32,000 yuan.

However, is it worth bringing home, and what can we do with it? Let's take a look at our experience over the past few days and see what a future AI computer should be like.

TL;DR:

This is a Linux desktop computer about the size of a Mac Mini, with 128GB of unified memory and an NVIDIA GB10 chip.

It can locally run models with up to 200 billion parameters, fine-tune and conduct inference tests on large models, and build various AI tools. It can generate images without an internet connection and has strong AI performance.

Although it's not a general-purpose computer, its complete full-stack AI development environment makes it more suitable for AI researchers, developers, and tech enthusiasts to quickly reproduce cutting-edge papers and validate ideas. It's not recommended for AI projects unrelated to deep learning, such as video editing and gaming.

Generate Images and Videos Freely, Deploy Anything

Supporting models with up to 200 billion parameters means that many models in the open-source market can now be directly run on this computer.

There are many platforms for deploying local large models. The more common ones are the open-source Open WebUI and the closed-source free LM Studio. We used Open WebUI, which supports open-source frameworks like Ollama, designed for efficiently running large language models on local computers.

The Ollama official website provides a rich selection of open-source models for download | https://ollama.com/models

We first deployed OpenAI's gpt-oss 20b to see how it performed. The speed was just average, but it was usable.

The device is simultaneously processing a video generation task.

Then we downloaded the 65GB gpt-oss 120b model. It was obvious that the DGX was under pressure. The thinking time, the time to generate the first token, and the token processing speed all slowed down significantly, much slower than our reading speed.

Jensen Huang said it supports models with up to 200 billion parameters, so we tried using the 142GB Qwen 3:235b model with 235 billion parameters. However, within a few seconds, the entire process was forcibly terminated.

We continued to test image, video, and 3D model generation. Comfy is the most user-friendly open-source image generation platform. By directly using its template workflows, we only need to download the model files and place them in the corresponding folders, such as those for LoRA, text encoding/decoding, and Diffusion models.

Download the corresponding diffusion models, audio, and LoRA models locally to generate AI videos offline.

Given that even Google Veo 3.1, a closed-source platform, requires a lottery system for video generation, the performance of open-source models relies more on good prompt control. Besides the final result, the speed of video generation remains a major issue.

Even with 128GB of memory, when generating a 10-second, 240-frame video using Tencent Hunyuan 1.5, the computer's memory was fully utilized, with the GPU utilization reaching 96% and the memory usage approaching 90GB.

I can understand why Altman temporarily halted work on Sora 2 to focus on model development. Video generation is extremely computationally intensive.

There are more options for image generation, such as Qwen, FLUX, and Z-Image. They all perform well, and the generation speed is not too slow.

Prompt: Anime style, masterpiece, Studio Ghibli style. A huge, rusty fighter mech is half-buried in lush green grass. The mech is covered with bright wildflowers and thick moss, as nature is reclaiming technology. A cinematic wide-angle shot, with huge cumulonimbus clouds in the bright blue sky, soft sunlight piercing through them, with a lens flare effect, creating a peaceful and idyllic atmosphere, high detail.

NVIDIA officially provides a detailed playbook for getting started with the DGX Spark, which includes comprehensive deployment methods, whether it's connecting to another Mac or using two DGX Sparks to run a project together.

In our previous review, we mentioned knowledge graphs, video summarization, etc., all of which are covered in this playbook. We also deployed our own knowledge graph, which can continuously upload new knowledge. The large language model used will automatically update the knowledge graph based on the newly added content.

The knowledge graph looks pretty cool and can even be displayed in 3D. For more playstyles, see the playbook:

What is fine-tuning, and can we fine-tune a DeepSeek R2?

Deploying existing large models locally might not be enough. With a supercomputer at hand, can I train a DeepSeek R2?

Not really. First of all, training large models requires a huge training dataset and sophisticated algorithm design. Moreover, the computing resources required for pre-training cannot be provided by a desktop supercomputer.

What about fine-tuning? Everyone is talking about fine-tuning pre-trained large models to improve their performance.

Fine-tuning | Image source: Dive into Deep Learning

Fine-tuning refers to adjusting the parameters of a general large model using supervised or reinforcement learning methods on a specific dataset to optimize its performance on specific tasks.

We used the open-source framework LLaMa Factory to fine-tune Llama 3, a model that was open-sourced halfway and then abandoned, to see what the effect would be.

Similarly, following NVIDIA's official guidance and the publicly available fine-tuning configuration, i.e., LoRA (Low-Rank Adaptation). If you have experience deploying Stable Diffusion, you should be familiar with LoRA. It's an efficient fine-tuning technique that doesn't require fine-tuning all the parameters of a large language model but only a small set of newly added parameters.

The open-source tool LLaMa Factory provides supervised fine-tuning LoRA configuration files for models such as DeepSeek and Qwen.

In the Llama 3 fine-tuning configuration file provided by LLaMa Factory, the fine-tuning dataset used is dataset:identity,alpaca_en_demo. The identity dataset is usually used to modify the model's self-awareness.

For example, when we ask