Don't Rush to Go All-In on DeepSeek V4: Hear Honest Words from 10 Industry Practitioners

The systematic war between models and AI applications has begun.

Text | Zhou Xinyu, Wang Yuchan

Editor | Yang Xuan

Interpreting the technical report of DeepSeek V4 has been the most feverish collective activity in the AI industry in recent days.

Is V4 powerful? In the dimension of engineering optimization, the answer is undoubtedly yes. In the past, people believed in the "brute - force aesthetics of the Scaling Law" - that is, improving model performance by piling up more high - quality computing power and larger parameter scales. However, V4 takes a completely different path. It defines a "restrained aesthetics of model training":

Instead of recklessly piling up computing power and parameters, it achieves this through a series of combined optimizations and reconstructions:

Attention mechanism (enabling the model to "grasp the key points", just as people automatically focus on key sentences when reading long articles)

MoE architecture (Mixture of Experts model, which can be understood as "letting different experts be responsible for different types of problems and only activating a few experts each time, saving time and effort")

Post - training (strengthening the model through targeted supplementary training after its initial formation)

Inference system engineering (optimizing the efficiency of each link during actual operation)

The result of these efforts is that the computing power required by V4 - Pro to process a long context of one million Tokens (about hundreds of thousands of words) has been reduced to 27% of that of the previous generation, V3.2. At the same time, the KV cache (which can be understood as the "scratch paper" for the model to "take notes" during a conversation) used for temporarily storing the conversation context has been compressed to 10% of its original size.

However, engineering is just engineering, and rankings are just rankings.

When evaluating a model, we don't want to just focus on the paper parameters. Instead, we want to discuss the value of V4 in real scenarios of deployment, development, and investment. For this purpose, we invited nearly 10 developers, application entrepreneurs, and investors to conduct about three days of experience and testing.

First, let's present a counter - intuitive conclusion: The impact of DeepSeek on the application layer may be greater than that on the model layer.

While marveling at the extreme engineering optimization, as DeepSeek itself admitted in the V4 technical report, its development trajectory lags behind the leading closed - source models by about 3 to 6 months. The current achievements of V4 are like a deal with the devil: It has extended the long boards of inference and Agent (intelligent agent) capabilities at the cost of sacrificing some accuracy.

Closed - source model manufacturers can temporarily breathe a sigh of relief. For the business world that values stability and accuracy, V4 is obviously not a model that can be directly deployed.

Li Bojie, the chief scientist of Pine AI, and Chillin, an entrepreneur of a leading Coding Agent, both told us frankly that the stability of tool invocation and the hallucination rate must be compensated at the harness level (the "reins" and "seat belts" for the intelligent agent, used to regulate its behavior and reduce the risk of errors). The deployment of V4 cannot be achieved without "scaffolding".

However, the iteration direction of the intelligent brain often affects the ecosystem of downstream applications. AI application entrepreneurship will face more severe double tests from technology and capital.

The industry consensus that "the performance of the base model is still rapidly iterating" also means that applications may become the gravel to be subverted by the model at any time. An investor from a dual - currency fund cited many examples of "has - beens": "Workflow, Coding..."

Chen Weipeng, the founder and CEO of the AI application company "Yongyue Intelligence", summarized: In the future, the barrier for AI applications lies in organizing the model, Agent, product scenarios, and data feedback into a reliable, low - cost, and scalable production system.

Highlight: Not only long - text and programming capabilities, but also high capabilities at low cost

Before we start: Core advantages - code and agent capabilities

In several key code and software engineering evaluations, V4 - Pro has demonstrated the highest level among current open - source models, almost on par with top - tier closed - source models. We have sorted out the core data as follows:

AI image generation

🧑‍🏫Huang Dongxu, co - founder and CTO of PingCAP

I'm migrating my Hermes workflow to DeepSeek V4. Previously, I was a bit wasteful, using Claude Opus and GPT5.4 as Agents. But later I found that most daily tasks don't actually require extremely high coding capabilities.

Daily office tasks mainly include: (a) organizing daily emails; (b) writing articles; (c) calendar management; (d) content summarization; (e) web browsing.

Now I've completely switched to DeepSeek V4. Its performance is better than I expected. Maybe it has been optimized for Chinese, and its overall language ability is more in line with the usage habits of native Chinese speakers than Opus and GPT.

So my first conclusion is: If you're currently using more expensive models as Agents for your daily work assistants, you can relatively safely switch to DeepSeek V4 Pro.

Its capabilities are roughly at the level of Claude Sonnet 4.5 to 4.6, but the price is less than a quarter of that of top - tier models. Now I basically don't have to worry about the cost of the Agent anymore.

The V4 paper keeps emphasizing the 1M context, but I don't really feel it's that outstanding because most current mainstream SOTA models also have at least a 1M context. It's just catching up.

Its real advantages are:

1. The cost is really very low;

2. It is an open - source model.

I don't have to worry too much that if Anthropic or OpenAI cuts off the supply, my previous workflows won't work. This kind of thing has actually happened before. In this regard, switching to DeepSeek V4 gives me a higher sense of security.

Secondly, let's talk about programming capabilities. Since the testing time is relatively short, I haven't used it to develop very complex large - scale system applications.

But for projects with a scale of about a few thousand lines of code, or for developing small applications, and in scenarios involving various external third - party system calls (such as accessing an unfamiliar tool on Supabase or TiDB Cloud by reading the documentation), so far, I haven't encountered any major problems.

In the scale of a few thousand to ten thousand lines of code, the one - shot success rate of V4 (providing all examples and instructions at once without additional debugging) is relatively high.

So if you're just developing simple small websites or small applications, I think DeepSeek's programming ability is much stronger than the previous generation.

Since my Harness framework is not very complexly orchestrated manually and mainly relies on the model's own collaborative ability (using Slock.ai).

To put it simply, there are the following two points:

1. It can collaborate with Agents using other models;

2. It can complete some simple/specific tasks.

So, if there are some stronger models (such as GPT5.5) to guide DeepSeek V4 Pro and then let it be responsible for execution, I think this model can significantly reduce the cost of the entire Harness Engineering.

🧑‍🏫Zhao Binqiang, vice - president of the Technology and Product Center of Lingyi Wanwu

DeepSeek V4 is not the "most all - around", but it is the "most trustworthy" - its firm commitment to open - source, complete technical report, extremely low inference cost, and localization of the entire technology stack make it the most cost - effective basic model choice for ToB (enterprise - oriented) scenarios.

Two things about DeepSeek V4 really amazed me.

First, the underlying innovation of the model architecture. It still maintains high - quality inference ability under a 1 - million - Token context window, which is due to the underlying innovation of the hybrid attention mechanism. This mechanism can be popularly understood as: "skimming" focuses on the overall meaning, while "intensive reading" accurately understands the details.

Especially, its exploration in Context compression is very advanced, and DeepSeek has openly disclosed the details in the technical report without reservation. This kind of honesty and open - source spirit is extremely valuable in the highly competitive large - model industry.

Second, full - stack adaptation to domestic computing power. DeepSeek has completed the adaptation of Huawei Ascend 910B/950 and has done very detailed work in aspects such as quantization, sparsification mechanism, and domain expert optimization.

This means that the domestic full - stack solution from chips to underlying software to model training and inference has taken a substantial step in the right direction. Although we can't say that we have completely got rid of the dependence on the Nvidia ecosystem, we have found the right development direction. The difficulty and significance of this cannot be overemphasized.

🧑‍🏫Li Bojie, chief scientist of Pine AI

What amazed me the most is that DeepSeek has successfully implemented a long list of architectural innovations, including MoE, CSA + HCA hybrid attention, mHC, Muon, and FP4QAT, on the current largest open - source scale of 1.6T (1.6 trillion parameters).

It's like successfully combining a bunch of technologies that are theoretically advanced but often fail in small - scale experiments into a giant engine and making it run stably. We've tried more than 20 architectural innovations ourselves, and the conclusion is almost always that "it works on a 7 - billion - parameter scale, but fails or even has a negative effect when scaled up".

Most other model architecture innovations are also stuck at this stage. Being able to make multiple innovations work together on the largest scale shows that DeepSeek has extremely deep technical accumulation in underlying training. Just one of the "mHC" technologies has reduced the nearly 3000 - fold signal amplification in the 27B experiment to about 1.6 times, making the training stable and controllable.

🧑‍🏫Song Chunyu, vice - president of Lenovo Group, chief investment officer and senior partner of Lenovo Capital and Incubation Group

DeepSeek has proven that "AI cost - effectiveness" can be an actively designed structural advantage.

It reduces the computing power to 27% and the video memory usage to only 10%. At the same time, although its total parameter count is 1.6T, only 49B parameters are activated each time, which is extremely efficient.

This structural cost reduction, combined with the low - price strategy of the V4 - Flash version API at 1 yuan per million Tokens, has made "affordable ultra - long context" a new benchmark for AI applications.

🧑‍🏫Chen Weipeng, founder and CEO of Yongyue Intelligence

What really excited me about DeepSeek V4 is not just the improvement of a single - point ability, but the fact that it shows that domestic large models have entered the stage of "participating in the system competition of the Agent era" from "catching up with the base capabilities".

In the past, people were more concerned about whether the model could answer, reason, and write code. But today, what really matters is whether the model can stably achieve the goal in complex tasks and whether it can be integrated into real - world product systems at a low enough cost and high enough efficiency.

Pity: V4 still lacks some "scaffolding" for real - world deployment

Before we start: Relative disadvantages - factual knowledge and extremely complex reasoning

DeepSeek official and various evaluation platforms have pointed out several obvious weaknesses of V4 - Pro. For a more intuitive understanding, we have sorted out the key weak - point data in the following table:

AI image generation.

🧑‍🏫Li Bojie, chief scientist of Pine AI

I mainly use it for code - related and Agentic tasks. In this type of work:

The tool invocation ability and general world knowledge of V4 - Pro have basically caught up with the second - tier versions of leading models (roughly equivalent to the level of Claude 4.6 Sonnet);

However, the stability of tool invocation and the hallucination rate are still major flaws. These two points must be compensated at the Agent Harness level (such as strengthening verification, automatic retry after failure, using an external knowledge base to make the model "grounded", and setting strict and clear tool - use specifications). Otherwise, in long - chain tasks, as the task chain lengthens, errors will be continuously magnified;

Once these two defects are compensated at the Harness layer, the overall inference cost can be several times lower than that of leading models. This is the real leverage.

Another aspect is that V4 - Flash is very good as a "sweet spot" for vertical fine - tuning. What is vertical fine - tuning? It means using professional data in a specific domain to "supplement training" on the basis of a general model to make it an expert in a certain industry.

The cost of post - training (SFT/RL) for an ultra - large model with 1.6 trillion parameters is too high for most companies to afford. Models with 200 - 300 billion parameters are the main size for post - training in the market. We previously did post - training on Qianwen 235B (235 billion parameters), and the effect was significantly weaker than that of V4 - Flash of the same size.

The performance of Flash has caught up with the previous - generation trillion - level open - source models, surpassing DeepSeek V3.2 with more than 600B and the old - version Kimi. Flash will become the preferred base for business fine - tuning.

🧑‍🏫Chillin, an entrepreneur of Coding Agent

Our internal evaluation conclusion is that in the Coding Agent scenario, DeepSeek V4 is at the level of Claude more than a year ago.

The problems may lie in two aspects: one is the parameter scale, and the other is the data. There is still a significant gap between DeepSeek and Anthropic.

If it is to be truly deployed, DeepSeek V4 still needs some special scaffolding, such as SWE - Agent (Software Engineering Agent), OpenHands (an open - source Coding Agent), Claude Code, and OpenClaw. These require additional configuration by developers.

🧑‍🏫Chen Weipeng, founder and CEO of Yongyue Intelligence

From the actual use of Loopit (an AI interactive content product under Yongyue Intelligence, mainly in the Coding scenario), objectively, there is still a gap between DeepSeek V4 and the strongest overseas closed - source models in terms of the stability and task completion rate of executing complex long - range tasks.

The ability gap between domestic top - tier models is narrowing. This indicates that model competition is entering a new stage: in the Agent era, whether a model can understand long contexts, adapt to complex frameworks, stably complete long - range tasks, and operate at an acceptable cost and speed will become equally important.

What really makes a difference is not just the model itself, but the overall system formed by the model, post - training, Agent framework, evaluation system, and engineering efficiency.

🧑‍🏫Song Chunyu, vice - president of Lenovo Group, chief investment officer and senior partner of Lenovo Capital and Incubation Group

The release of V4 does not include a native multi - modal version (a model that can process text, images, sounds, etc. simultaneously), which is a bit of a pity in the current market environment.

However, combined with its strategy of fully embracing domestic computing power, this is likely

本文由「阿菜cabbage」原创出品，转载或内容合作请点击转载说明；违规转载必究。

Don't rush to go all-in on DeepSeek V4. First, take a look at the honest words of these 10 industry practitioners.

Highlight: Not only long - text and programming capabilities, but also high capabilities at low cost

Before we start: Core advantages - code and agent capabilities

🧑‍🏫Huang Dongxu, co - founder and CTO of PingCAP

🧑‍🏫Zhao Binqiang, vice - president of the Technology and Product Center of Lingyi Wanwu

🧑‍🏫Li Bojie, chief scientist of Pine AI

🧑‍🏫Song Chunyu, vice - president of Lenovo Group, chief investment officer and senior partner of Lenovo Capital and Incubation Group

🧑‍🏫Chen Weipeng, founder and CEO of Yongyue Intelligence

Pity: V4 still lacks some "scaffolding" for real - world deployment

Before we start: Relative disadvantages - factual knowledge and extremely complex reasoning

🧑‍🏫Li Bojie, chief scientist of Pine AI

🧑‍🏫Chillin, an entrepreneur of Coding Agent

🧑‍🏫Chen Weipeng, founder and CEO of Yongyue Intelligence

🧑‍🏫Song Chunyu, vice - president of Lenovo Group, chief investment officer and senior partner of Lenovo Capital and Incubation Group