DeepSeek-V3.2-Exp Model Released and Open-Sourced with Significantly Reduced API Prices

Under the new pricing policy, the cost for developers to call the DeepSeek API will be reduced by more than 50%.

On September 29, the DeepSeek-V3.2-Exp model was officially released and has been open-sourced on Huggingface and ModelScope. Currently, the official app, web version, and mini-program have all been synchronously updated to DeepSeek-V3.2-Exp, and the API prices have been significantly reduced.

According to the official introduction, the DeepSeek-V3.2-Exp model is an experimental version. As an intermediate step towards the new-generation architecture, V3.2-Exp introduces DeepSeek Sparse Attention (a sparse attention mechanism) on the basis of V3.1-Terminus, and conducts exploratory optimization and verification on the training and inference efficiency of long texts.

Specifically, DeepSeek Sparse Attention (DSA) has achieved a fine-grained sparse attention mechanism for the first time, and significantly improved the training and inference efficiency of long texts with almost no impact on the model's output effect.

According to a post on the WeChat official account of "Huawei Computing", on September 29, DeepSeek-V3.2-Exp was released and open-sourced, introducing a sparse Attention architecture. Ascend has quickly completed the adaptation and deployment based on inference frameworks such as vLLM/SGLang, achieved 0-day support for DeepSeek-V3.2-Exp, and open-sourced all inference codes and operator implementations to developers.

DeepSeek also stated that during the research process of the new model, many new GPU operators need to be designed and implemented. The official uses the high-level language TileLang for rapid prototype development to support more in-depth exploration. In the final stage, using TileLang as the accuracy baseline, the official gradually uses low-level languages to implement more efficient versions. Therefore, the main operators open-sourced this time include both TileLang and CUDA versions. The official recommends that the community use the TileLang-based version for research experiments to facilitate debugging and rapid iteration.

Thanks to the significant reduction in the service cost of the new model, the official API prices have also been correspondingly reduced. Under the new pricing policy, the cost for developers to call the DeepSeek API will be reduced by more than 50%.

DeepSeek officially released DeepSeek-V3.1 on August 21. The main changes in this upgrade are as follows: First, it has a hybrid inference architecture, where one model supports both the thinking mode and the non-thinking mode; second, it has higher thinking efficiency. Compared with DeepSeek-R1-0528, DeepSeek-V3.1-Think can give answers in a shorter time; finally, it has stronger Agent capabilities. Through post-training optimization, the new model has significantly improved performance in tool usage and agent tasks.

On September 22, DeepSeek-V3.1 was updated to the DeepSeek-V3.1-Terminus version. This update improved the issues feedback by users while maintaining the original capabilities of the model, including: language consistency, alleviating situations such as mixed Chinese and English and occasional abnormal characters; Agent capabilities, further optimizing the performance of Code Agent and Search Agent. The official said that the output effect of DeepSeek-V3.1-Terminus is more stable than the previous version.

To rigorously evaluate the impact of introducing sparse attention, the official specifically aligned the training settings of DeepSeek-V3.2-Exp with those of V3.1-Terminus. On public evaluation sets in various fields, the performance of DeepSeek-V3.2-Exp is basically on par with that of V3.1-Terminus.

This article is from Jiemian News. Reporter: Chen Xiaotong. 36Kr is authorized to publish it.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The DeepSeek-V3.2-Exp model has been released and open-sourced, and the API prices have been significantly reduced.