Going Viral: DeepSeek V4's Cost Drops 73%, LIANG Wenfeng Joins Hands with Huawei and Cambricon, "The Origin God" Returns

This is an open-source behemoth with 1.6 trillion parameters.

According to a report by Zhidongxi on April 24th, today, DeepSeek officially released and open - sourced the preview version of the DeepSeek - V4 series. This is its new generation of flagship model system following V3.2. Zhidongxi conducted hands - on tests immediately.

The return of DeepSeek V4 "Yuanshen" has indeed made an extraordinary impact. It almost instantly dominated the online discussions and occupied three of the top five spots on the Weibo hot search list, second only to Xiaomi YU7GT.

This release includes two models: DeepSeek - V4 - Pro and DeepSeek - V4 - Flash. They adopt the MoE architecture respectively, with the total parameter scale reaching 1.6T (49B activated) and 284B (13B activated), and both support a maximum context of 1 million tokens.

DeepSeek officials also stated that due to the limitation of high - end computing power, the service throughput of DeepSeek - V4 - Pro is currently very limited. It is expected that after the mass - market launch of the Ascend 950 super - nodes in the second half of the year, its price will be significantly reduced. In addition, DeepSeek - V4 has obtained Day 0 adaptation support from Cambricon, and the relevant adaptation code has been open - sourced to the GitHub community.

DeepSeek - V4 - Pro focuses on the upper limit of performance, competing with closed - source flagship models; while DeepSeek - V4 - Flash significantly reduces the parameter scale and activation scale in exchange for lower latency and lower cost.

Compared with the previous - generation model, it further improves in Agent capabilities, world knowledge, and complex reasoning tasks and for the first time opens "million - token context" as a default capability.

In terms of Agent capabilities, the Agent capabilities of DeepSeek - V4 - Pro are significantly enhanced. It ranks among the top open - source models in evaluations such as Agentic Coding. Internal evaluations show that the delivery quality is close to that of Claude Opus 4.6 in non - thinking mode, but there is still a gap compared with its thinking mode.

DeepSeek - V4 - Pro has outperformed currently publicly evaluated open - source models in high - difficulty tasks such as mathematics, STEM, and competitive coding. Its overall performance is close to or even comparable to top - tier closed - source models such as GPT - 5.4 and Claude Opus 4.6 - Max.

Meanwhile, DeepSeek - V4 provides a more radical set of optimizations for long - context efficiency: in the 1 - million - token scenario, the single - token inference computation is only 27% of that of V3.2, and the KV Cache occupancy is reduced to about 10%, significantly reducing the computing power and video memory costs of long - chain tasks.

At the same time, the official announced the API pricing for the DeepSeek - V4 series: for DeepSeek - V4 - Pro, it is 1 yuan per million tokens when the input hits the cache, 12 yuan per million tokens when the input does not hit the cache, and 24 yuan per million tokens for output; for DeepSeek - V4 - Flash, it is only 0.2 yuan per million tokens when the input hits the cache, 1 yuan per million tokens when the input does not hit the cache, and 2 yuan per million tokens for output.

Currently, the DeepSeek - V4 series has been launched on the official website and App, and the API and model weights are also open.

Experience address:

chat.deepseek.com or DeepSeek official APP

API documentation:

https://api-docs.deepseek.com/zh-cn/guides/thinking_mode

Open - source link:

https://huggingface.co/collections/deepseek-ai/deepseek-v4

https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4

Technical report:

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

01. Significant improvement in Agentic programming ability, consuming 540,000 tokens to read the "Three - Body" trilogy

We initially felt the changes in DeepSeek - V4, and the main model we tested was DeepSeek - V4 - Pro.

In the one - shot case on the front - end web page, DeepSeek - V4 - Pro showed high execution efficiency. Since our requirements were not complex, the model only took 5 seconds to think and then quickly started development, which is significantly different from the previous DeepSeek models that wasted many tokens on thinking.

After entering the actual generation process, the output length of DeepSeek - V4 - Pro was significantly longer than that of other DeepSeek models. Its generation speed was relatively fast, and it could basically output in units of 5 lines of code.

Finally, the generation result of DeepSeek - V4 - Pro is as follows. It can be seen that the completion degree of its web page is higher than that of DeepSeek - V3.2, and the design is more diverse.

The website created by DeepSeek - V4 - Pro: https://mcp.edgeone.site/share/9pD1cRzY1QA8bmmBLDZ8S

However, such simple programming problems are no longer a challenge for DeepSeek - V4 - Pro. We tried to let it complete a task that combines Agent capabilities and programming: plan a trip to Shanghai, then integrate all relevant information into a travel website and attach the corresponding location of scenic spots.

During the execution process, it can be seen that DeepSeek - V4 - Pro can perform complex multi - round tool calls. The number of items in the online search has also increased compared with the previous models, and the information collection is more comprehensive.

Finally, DeepSeek - V4 - Pro collected complete itinerary information, made a reasonable plan, and attached the location of each scenic spot. You can directly use it in the navigation App after clicking, which is very convenient. In the Agent task, it can be observed that its actions are very decisive. Tool calls and thinking are resolved within a few seconds, and the token efficiency is good.

The travel plan planned by DeepSeek using Agent and programming capabilities: https://mcp.edgeone.site/share/4TxFYOy24bgaEwxFoxisj

Our next case is related to long - text. The DeepSeek - V4 series models often claim that they can handle the entire "Three - Body" trilogy at once, and we uploaded the complete "Three - Body" as they claimed.

After uploading such an extremely long file, DeepSeek can quickly locate the specified content, successfully finding a needle in a haystack. However, this extremely long - context ability comes at a cost. Just outputting this little content consumed 540,000 tokens.

We also used the question "Which model has OpenAI updated to" to test the knowledge cut - off date of the model. It can be seen that the knowledge cut - off date of DeepSeek - V4 - Pro currently still remains in 2025.

In addition, this model should not support visual capabilities for the time being. After uploading an image, it will still perform text extraction, and images without text will show that they cannot be processed.

02. Million - token context becomes the standard, and the new architecture reduces the "long - task cost"

The most direct change in this generation of V4 is to make "long - context" a default capability.

Different from the traditional way of simply expanding the window, DeepSeek - V4 - Pro introduces a new hybrid attention architecture, combining Compressed Sparse Attention with High - Compression Attention (HCA) and cooperating with DSA sparse attention to compress in the token dimension.

In addition, the model introduces Manifold - Constrained Hyper - Connection (mHC) to enhance the traditional residual connection and uses the Muon optimizer to improve the convergence speed and training stability. This series of designs enables the model to "remember longer" while effectively controlling the computing cost.

According to the data provided by the official, in the 1 - million - token context, the single - token inference TFLOPs of DeepSeek - V4 - Pro decrease by about 3.7 to 9.8 times compared with DeepSeek - V3.2, and the KV Cache occupancy decreases by 9.5 to 13.7 times.

This means that long - chain tasks that were difficult to run in the past (such as multi - round Agent planning and long - document processing) are starting to become executable.

03. Simultaneous improvement in inference, knowledge, and code capabilities, open - source models approaching the upper limit of closed - source models

In terms of the ability structure, the improvement of DeepSeek - V4 - Pro is a simultaneous increase in inference, knowledge, and Agent capabilities.

In knowledge and inference tasks, it outperforms current mainstream open - source models in evaluations such as SimpleQA, Apex, and Codeforces, and is close to GPT - 5.4 and Gemini 3.1 Pro in many tasks. For example, it scores 90.2 in Apex Shortlist, exceeding top - tier closed - source models; in competitive tasks such as Codeforces, it also maintains a top - tier level.

In tasks related to Agent capabilities, DeepSeek - V4 - Pro performs stably in indicators such as SWE Verified and Terminal Bench. The SWE Verified score reaches 80.6, close to Claude Opus 4.6 and significantly higher than most open - source models. In Terminal Bench 2.0, its performance also exceeds models such as GLM - 5.1 Thinking and Kimi K2.6 Thinking.

Overall, DeepSeek - V4 - Pro is currently the "ceiling" of open - source models.

04. Special optimization for Agent capabilities, starting to be refined around real - world workflows

This generation of DeepSeek - V4 significantly strengthens the adaptation to Agent scenarios. It conducts special optimization for mainstream Agent frameworks such as Claude Code, OpenClaw, and CodeBuddy and performs more st

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Going viral! The cost of DeepSeek V4 has plummeted by 73%. LIANG Wenfeng has joined hands with Huawei and Cambricon. The origin god has returned, and everyone stands up.

01. Significant improvement in Agentic programming ability, consuming 540,000 tokens to read the "Three - Body" trilogy

02. Million - token context becomes the standard, and the new architecture reduces the "long - task cost"

03. Simultaneous improvement in inference, knowledge, and code capabilities, open - source models approaching the upper limit of closed - source models

04. Special optimization for Agent capabilities, starting to be refined around real - world workflows