DeepSeek's new model is open-sourced, its new architecture shines, and domestic AI chips go on a collective spree.
DeepSeek is one step closer to the next-generation architecture!
According to a report by Zhidx on September 30th, yesterday, DeepSeek announced the open - sourcing of the DeepSeek-V3.2-Exp experimental model. This model introduces the DeepSeek Sparse Attention mechanism for the first time. Without significantly affecting the model's output quality, it greatly improves the efficiency of long - text training and inference, and is defined by DeepSeek as "an intermediate step towards the new - generation architecture."
HuggingFace URL:
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
ModelScope Community URL:
https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.2-Exp
This improvement also reduces the service cost of DeepSeek's new model. As a result, DeepSeek has implemented a new pricing policy, reducing the cost for developers to call the DeepSeek API by more than 50%.
The most significant price reduction is for the output tokens: the price for the DeepSeek-V3.2-Exp model to output one million tokens is only $3, which is 1/4 of the price of the DeepSeek-V3.1 series models.
As of 6 a.m. on September 30th, cloud platforms such as Huawei Cloud, PPIO, and UCloud have announced the launch of DeepSeek-V3.2-Exp, and AI chip manufacturers such as Huawei, Cambricon, and Hygon Information have announced compatibility with DeepSeek-V3.2-Exp.
DeepSeek-V3.2-Exp is built on the basis of DeepSeek-V3.1-Terminus. On public evaluation datasets in various fields, the performance of the two models is basically the same. However, DeepSeek-V3.2-Exp uses significantly fewer tokens to complete tasks.
Currently, the DeepSeek App, web version, and mini - program have all simultaneously launched the DeepSeek-V3.2-Exp model. DeepSeek has also temporarily retained the API interface of DeepSeek-V3.1-Terminus to facilitate developers to conduct comparative verification.
In addition to the model itself, DeepSeek has also open - sourced relevant technical reports and code, and provides GPU operators in both TileLang and CUDA versions to enable researchers to conduct experiments and optimizations at different levels.
Technical report URL: https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf
DeepSeek also added that as an experimental version, although DeepSeek-V3.2-Exp has been verified for its effectiveness on public evaluation datasets, it still needs to be tested on a larger scale in real - world user scenarios to rule out the possibility of poor performance in certain scenarios.
01.
Huawei, Hygon, and Cambricon Quickly Achieve Compatibility
Netizens Exclaim that the Second DeepSeek Moment is Coming
As soon as DeepSeek-V3.2-Exp was launched, it sparked a strong response in the industry and among developers. Many domestic enterprises announced the completion of compatibility and launch of DeepSeek-V3.2-Exp at the first time.
The official account of Huawei Computing posted an announcement stating that Ascend has quickly completed compatibility and deployment based on inference frameworks such as vLLM/SGLang, achieving Day 0 support for DeepSeek-V3.2-Exp and open - sourcing all inference codes and operator implementations for developers. When DeepSeek-V3.2-Exp outputs a 128K long sequence on Ascend devices, it can maintain an inference generation speed with a TTFT (time to first token) of less than 2 seconds and a TPOT (time per output token) of less than 30 milliseconds.
Huawei Cloud was the first to launch DeepSeek-V3.2-Exp and uses the CloudMatrix 384 super - node to provide inference services for this model.
Four minutes after DeepSeek announced the open - sourcing of the DeepSeek-V3.2-Exp model, Cambricon also posted that it had simultaneously achieved Day 0 compatibility with this model and open - sourced the source code of the large - model inference engine vLLM-MLU.
Cambricon achieved rapid compatibility through Triton operator development, optimized performance through BangC fused operator development, and achieved a high level of computing efficiency based on the parallel strategy of computation and communication.
The size of the DeepSeek-V3.2-Exp model reaches 671GB, and downloading it may take several hours. This Day 0 compatibility achieved in just four minutes may mean that Cambricon and DeepSeek had started the compatibility work before the model was released.
According to a report by Economic Observer Network, Hygon Information's DCU (Deep Computing Unit) has quickly achieved Day 0 - level efficient compatibility and optimization for DeepSeek-V3.2-Exp, ensuring "zero - wait" deployment of large - model computing power.
In DeepSeek's tweet announcing the open - sourcing of DeepSeek-V3.2-Exp, many netizens shared their experiences and feelings about using the model. One netizen said that they tested DeepSeek-V3.2-Exp on a code library of 100,000 tokens, and the speed improvement was very obvious.
Some netizens sighed that the DeepSeek API is now almost free.
Some netizens even believe that the launch of this model may mean that the second DeepSeek moment is coming.
On Hugging Face, there are also many discussions in the community section of DeepSeek-V3.2-Exp. However, the most - watched post is a "complaint" from a Chinese netizen: "Does this model really have to be updated before the National Day?"
Some netizens also listed the times when DeepSeek updated its models, and almost all of them were a few days before holidays.
02.
First - hand Experience of DeepSeek-V3.2-Exp
Architectural Innovation May Be More Important Than Performance Improvement
What are the differences in the user experience between DeepSeek-V3.2-Exp and the previous DeepSeek-V3.1-Terminus?
In programming, the code written by DeepSeek-V3.2-Exp is significantly shorter. For the same task, it outputs fewer lines of code than DeepSeek-V3.1-Terminus.
However, to some extent, this also affects the model's performance. The code for a ball - bouncing animation written by DeepSeek-V3.2-Exp did not run properly, and the ball flew out of the hexagonal area. In a previous test by Zhidx, DeepSeek-V3.1-Terminus completed this task perfectly.
Zhidx also asked DeepSeek-V3.2-Exp to complete an information - retrieval task, asking it to recommend several plants that are suitable for novice balcony potted plants, grow fast, have fruits that can be eaten raw, and are absolutely safe for children, preferably with simple sowing tips.
Compared with DeepSeek-V3.1-Terminus (left), the generation result of DeepSeek-V3.2-Exp (right) is shorter, and the wording is relatively "plain". Moreover, the plants recommended by DeepSeek-V3.2-Exp, such as figs and passion fruits, require operations such as cuttings and high - frequency maintenance, which do not meet the novice - friendly requirements in the prompt.
Performance of DeepSeek-V3.1-Terminus (left) and DeepSeek-V3.2-Exp (right) in the information - retrieval task (Source: Zhidx)
Overall, DeepSeek-V3.2-Exp does improve inference efficiency but makes some concessions in terms of capabilities.
The Zhihu blogger @toyama nao also found similar problems in the evaluation. He believes that DeepSeek-V3.2-Exp has obvious shortcomings in working memory, computational accuracy stability, etc., and may also be prone to slacking off and getting into infinite loops.
Evaluation of DeepSeek-V3.2-Exp by Zhihu blogger @toyama nao
This is also confirmed by the views of other netizens. For example, a netizen posted on the X platform, saying that they did not see any improvement in this model and questioned why we should use a model with degraded capabilities.
As an experimental model, the greater contribution of DeepSeek-V3.2-Exp may be at the theoretical level. DeepSeek said that compared with DeepSeek-V3.1-Terminus, the only architectural modification in DeepSeek-V3.2-Exp is the introduction of DeepSeek Sparse