HomeArticle

Elon Musk exclaimed that it's incredibly powerful. A domestic 0.8B model has been open-sourced. Netizens: I'm raring to go on my iPhone.

智东西2026-03-03 17:31
Four models start from 0.8B, and the performance of the 9B model surpasses that of GPT-5 nano.

According to a report by Zhidx on March 3rd, last night, Alibaba open-sourced 4 small-sized models in the Qwen3.5 series: Qwen3.5-0.8B, Qwen3.5-2B, Qwen3.5-4B, and Qwen3.5-9B, which can be deployed on end-side devices such as laptops.

▲Partial screenshot of Qwen3.5's post on social platform X

As soon as the models were released, not only did the developer community discuss them enthusiastically, but even Elon Musk dropped by the comment section under Qwen's post on social platform X. He commented: "Amazing intelligence density." With his own Grok4.2 about to be released, Musk's attention to Chinese large model peers remains undiminished.

▲Elon Musk's comment under the post

In multiple benchmark evaluations such as instruction following, multilingual, and visual reasoning, Qwen3.5-9B took the lead in many evaluations including GPQA Diamond, MMMU-Pro, ERQA, and Video-MME, significantly outperforming GPT-OSS-20B, GPT-5 nano, Gemini 2.5 Flash-Lite, as well as Alibaba's own Qwen3-Next-80B-A3B-Thinking and Qwen3-30B-A3B-2507.

As a small model, Qwen3.5-4B has performance close to that of Qwen3.5-9B. It can compete with larger models in tasks such as multilingual knowledge, visual reasoning, and document understanding, but there is still a gap in pure mathematical reasoning, which is also a common problem for small-version models.

▲Evaluation results of Qwen3.5-9B and Qwen3.5-4B

These small models are all based on the unified Qwen3.5 base, open-sourced under the Apache 2.0 license and available for commercial use. They support LoRA/full fine-tuning, and consumer-grade graphics cards can start task adaptation. Specifically:

The 0.8B/2B versions are characterized by small size and high speed, making them the first choice for end-side devices. These two models are very suitable for deployment on mobile devices, IoT edge devices, and real-time interaction scenarios with low latency.

The 4B version has more powerful performance and is a multi-modal base model. This version is suitable as the core brain of lightweight intelligent agents, balancing performance and resource consumption.

The 9B version has a compact model structure, but its performance is comparable to that of gpt-oss-120B. It is suitable for server-side deployment where high intelligence is required but video memory resources are limited. It is a highly cost-effective general model choice.

Currently, all models have been open-sourced on the ModelScope community and Hugging Face. Meanwhile, the base models of the small-sized models have also been open-sourced.

Since the official release of the Qwen3.5 model in mid-February, many domestic and foreign developers have been "urging" for the small-scale version of Qwen3.5. With the release of the small-version models today, developers immediately participated in the discussion and started to experience them.

A developer said: "The real highlight is that 9B defeated GPT-5-Nano by 13 points in the MMMU-Pro test. A model that can run on a laptop outperforms the cloud-based flagship Nano model in terms of performance. The architectural advantage far outweighs the number of parameters."

▲Comments from netizens on social platform X

Another developer said: "This is much more powerful than people think. A Qwen 3.5 running on a Mac mini combined with OpenClaw running 24/7 can create an AI employee with a cost less than the monthly salary of a junior employee."

Some developers also shared that they used an AMD Ryzen AI Max+395 processor and the Q4_K_XL quantization algorithm, and enabled a full 256k context window. The processing speed reached about 30 tokens/s, and it only required less than 16GB of video memory. He exclaimed: "My goodness! Although Qwen3.5-9B is small in size, its performance is very strong: it has excellent multilingual processing ability, rich general knowledge reserves, and strong visual input processing ability."

▲Comments from netizens on social platform X

It is worth mentioning that some netizens said that Qwen3.5 can not only run on any laptop but also on mobile phones. As soon as this statement was made, many people asked: "How to run it on an iPhone?"

▲Comments from netizens on social platform X

However, some developers said straightforwardly: "The 4B model is just an intelligent auto-completion tool, not a thinking partner. The correct rate in GPQA Diamond (graduate-level reasoning) is about 45%, and the correct rate in the HMMT math test is about 15%. This means that it will make mistakes in more than half of the difficult questions."

▲Comments from netizens on social platform X

Although the capabilities of small models are limited, they have reached the level of cloud-deployed models such as Gemini 3 Flash in specific capability dimension evaluations, which means that they can already play a practical role in many end-edge scenarios.

So far, the Qwen3.5 family has been open-sourced:

1 large-sized model: Qwen3.5-397B-A17B

3 medium-sized models: Qwen3.5-122-A10B, Qwen3.5-35B-A3B, Qwen3.5-27B

4 small-sized models: Qwen3.5-0.8B, Qwen3.5-2B, Qwen3.5-4B, Qwen3.5-9B

Hugging Face address: https://huggingface.co/collections/Qwen/qwen35

ModelScope community address: https://modelscope.cn/collections/Qwen/Qwen35

Appendix: Complete evaluation results of Qwen3.5-9B and Qwen3.5-4B

This article is from the WeChat official account “Zhidx” (ID: zhidxcom), author: Li Shuiqing, published by 36Kr with authorization.