DeepSeek aims to adopt the business strategy of Mixue Ice Cream & Tea to create a Chinese version of Claude Code.
DeepSeek is to large models what Mixue is to milk tea. You don't have to worry about cost - performance because it has excellent capabilities and won't put a strain on your wallet.
Recently, DeepSeek officially announced that the API price of the DeepSeek - V4 - Pro model will be permanently reduced. Meanwhile, DeepSeek stated that the API has completed output speed - up and service capacity expansion. It is now faster and more stable, with a default support for 500 concurrent requests. Enterprise users can apply online for higher concurrency.
By releasing models, offering discounts, lowering the cache hit price, and finally turning temporary promotions into long - term prices, the price benchmark of large - model APIs is being rewritten. And the next stop after low - price models is likely to be Agents.
DeepSeek Permanently Cuts Prices, Liang Wenfeng Slashes Token Prices
Let's first briefly sort out the timeline of DeepSeek's price cuts:
- On April 24th, the preview version of DeepSeek V4 was officially released.
- On April 25th, DeepSeek announced a 75% discount on V4 - Pro.
- On April 26th, DeepSeek announced that the cache hit price was adjusted to one - tenth of the initial launch price.
- On April 28th, DeepSeek announced that the 75% discount on V4 - Pro was extended until May 31st.
- On May 22nd, DeepSeek announced that the price of V4 - Pro was permanently reduced to one - fourth of the original price.
The key point of this timeline is that the temporary discount has become a permanent price cut. After the adjustment, the input cache hit price of DeepSeek - V4 - Pro dropped from 0.1 yuan per million Tokens to 0.025 yuan, and the input cache miss price dropped from 12 yuan per million Tokens to 3 yuan;
The output price dropped from 24 yuan per million Tokens to 6 yuan. With the default 500 concurrency and service speed - up, the official API has become more attractive to developers and enterprises.
https://api-docs.deepseek.com/zh-cn/quick_start/pricing
The most direct impact of the price cut is to push the task cost to the forefront of developers' decision - making.
In the code scenario, a single task may involve reading project files, analyzing logs, multiple rounds of modification, and repeated test runs. Token consumption can easily increase significantly.
High - consumption scenarios such as long - context processing, codebase analysis, batch refactoring, automated testing, and multi - round Agent execution are now more within the budget range of individual developers and small teams.
In the past, developers chose Claude, OpenAI, or Gemini mainly based on model capabilities, stability, ecosystem, and usage habits. DeepSeek's significant and permanent price cut means that developers' usage habits can be easily changed in the face of absolute cost - performance.
Following this line, DeepSeek's consistent market role becomes clearer: it continuously establishes a price advantage in the large - model market with low prices, open - source, and strong reasoning capabilities. For domestic model manufacturers, the permanent price cut of V4 - Pro is equivalent to redrawing an API pricing line.
Models like Zhipu, MiniMax, and Yuezhianmian, which also rely on API fees and target developers and enterprise customers, will undoubtedly face pressure. In contrast, overseas leading models such as Claude, OpenAI, and Gemini will experience relatively limited short - term impacts due to differences in market, customer structure, and ecosystem position.
However, if DeepSeek subsequently launches a coding tool similar to Claude Code and supports high - frequency calls with low token costs, price - sensitive developer groups will be more easily attracted.
Liang Wenfeng's previous explanation of DeepSeek's pricing philosophy can also be understood in today's context.
As early as when DeepSeek V2 cut prices in 2024, Liang Wenfeng mentioned that DeepSeek just follows its own rhythm, calculates costs, and sets prices on the principle of neither losing money nor making excessive profits. He also said that part of the price cut comes from the cost reduction brought about by the exploration of the next - generation model structure, and the other part is because APIs and AI should be inclusive and accessible to everyone.
Rather than using the API as a high - margin revenue channel, DeepSeek seems to be using its strong Infra capabilities to reduce inference costs and then attract developers, applications, and downstream ecosystems with low prices.
@bookwormengr, a blogger on the X platform, recently gave a more radical explanation in a long article titled "DeepSeek's 10 trillion USD grand strategy".
He believes that DeepSeek's real goal may not be to compete with Zhipu, Yuezhianmian, or MiniMax, nor is it to quickly complete its product lines in multimodality, voice, and video. Instead, it is to promote the formation of a cheaper and more decentralized AI hardware ecosystem by continuously reducing the resource requirements for training and inference.
In his view, DeepSeek's long - term value lies not only in the model itself but also in enabling more domestic storage, GPUs, ASICs, network chips, and heterogeneous hardware to enter the large - model training and inference system.
This judgment may not be fully realized, but it explains the direction behind DeepSeek's series of choices:
MoE, MLA, DSA, GRPO, RLVR, KV Cache compression, Dual Path, TileLang. On the surface, they are optimizations of model architecture and inference engineering. In essence, they are all aimed at reducing the dependence on high - end HBM, top - tier GPUs, and the CUDA ecosystem.
In a series of price - cut announcements, what deserves the most attention is not only the decrease in output price but also the decrease in cache hit price.
In the large - model inference process, KV Cache is a key cost item. When the model processes long - context, it needs to store the Keys and Values corresponding to historical tokens and reuse them during subsequent generation. The longer the context, the more caches need to be saved and read, and the greater the pressure on video memory, bandwidth, and storage systems.
In ordinary chats, the cache pressure may not be obvious. However, when dealing with code, long documents, and Agent tasks, the cost structure will change rapidly. @bookwormengr specifically calculated the KV Cache cost in his long article.
Based on the premise of 1 million tokens of context, 8 - bit KV precision, and 16 - bit index precision, he estimated that DeepSeek V4 only requires about 5.48GB of HBM, while GLM5 requires about 60GB, and Qwen3 - 235B - A22B requires about 89GB.
The real expensive part of long - context and Agent tasks is not only the model generation itself but also the cache, video memory, bandwidth, and repeated context transfer.
When a Code Agent processes a project, it may need to repeatedly read the same codebase structure, the same batch of files, the same task history, the same set of system prompts, and the same batch of test logs. If each round is billed based on the full context, long - term tasks will quickly become expensive. After the cache hit price is reduced, the cost of repeated context will be significantly lower.
DeepSeek's continuous investment in MoE architecture, long - context processing, KV Cache compression, and inference efficiency in recent years is obvious to all. The price cut is an inevitable result of technological iteration and will completely disrupt the AI programming market landscape.
Why Must There Be a Chinese Version of "Claude Code"?
The first thing to be affected is the subscription model of AI programming tools.
Mainstream AI programming tools in the market have launched monthly - paid Coding Plans, providing users with rights such as code completion, model invocation, and Agent execution. In the era of lightweight completion, the consumption per invocation is extremely low.
However, AI programming has evolved from single - time completion to full - process Agent - automated coding. The model can independently complete code modification, test runs, and error repair, resulting in a significant increase in Token consumption per task.
When the underlying API prices are also significantly reduced, the Coding Plan must find new support points. These support points are more likely to lie in engineering capabilities - for example, can it better understand the project structure, accurately select context, control Token consumption, stably modify code, handle Git, terminals, CI/CD, and manage permissions and audit records in an enterprise environment?
API transit stations also need to be re - positioned. For individual developers, affordability and usability are still important. But for enterprises, stability, auditability, controllability, and portability are more crucial.
Following this logic, the changes in the Coding Plan and transit stations are only on the surface. What is more worth exploring after the price cut is who actually controls the developer entry point.
Google CEO Sundar Pichai recently accepted an interview with "Hard Fork". He publicly admitted for the first time that Google is very competitive in text, multimodality, voice, reasoning, and overall intelligence. However, it still lags behind in capabilities such as agentic coding, especially in tool invocation, instruction following, and long - cycle tasks.
He also mentioned that the more crucial thing is to use the model in the real world, let the data flow back, and continue to iterate. Pichai specifically said that coding is an area that needs to deal with data flows.
Terminal tools can observe how developers assign tasks, ask questions, accept or reject suggestions, and request the model to continue repairs. It can also determine whether an Agent execution has completed a task through test results, terminal logs, file changes, and Git commits. This type of data is very valuable for coding models and Agent products.
From the public recruitment actions, DeepSeek has become more active around Agents recently.
We can also see roles such as Agent deep - learning algorithm researchers, Agent data strategy engineers, product managers, and R & D engineers in the job openings. More importantly, Chen Deli, a senior researcher at DeepSeek, directly issued a recruitment notice, mentioning that he wants to build a Code Harness from scratch.
As he said, Model + Harness = Agent. In Agent products, the model is responsible for understanding and generation, and the Harness is responsible for bringing the model's capabilities into the real engineering environment, which is equivalent to the "execution system" outside the model.
The DeepSeek version of Claude Code should not just provide developers with a dialog box but an engineering system that can continuously execute tasks.
Cui Tianyi's attention after joining DeepSeek is also related to the engineering nature of Code Agent.
Public information shows that Cui Tianyi graduated from the Department of Computer Science at Zhejiang University. He was admitted to Zhejiang University through the informatics competition, won the gold medal in the ACM Asia Regional Contest six times, worked at Jane Street for 9 years, and co - founded TSY Capital.
The difficulty of Code Agent is not only to generate code but also to continuously execute tasks in real projects. The long - term emphasis on low latency, stability, automated execution, and risk control in quantitative trading systems is at least similar in engineering paradigms when applied to Agent Harness.
The product capabilities of Agent tools include not only code writing but also permissions, auditing, data isolation, and security policies.
This, in turn, provides opportunities for domestic models like DeepSeek. If DeepSeek can combine low - cost models, Code Harness, local deployment, and enterprise - level permission control, it will have stronger substitution value