HomeArticle

DeepSeek-V4: The Singularity of the Cambrian Explosion of AI Applications in China Has Arrived

硅基星芒2026-04-24 17:44
Not tempted by praise, not intimidated by slander, follow the right path and correct yourself uprightly.

Near noon on April 24th, the official WeChat account of DeepSeek released an announcement titled "DeepSeek-V4 Preview Version: Entering the Era of Universal Access to One Million Contexts". The long-awaited V4 has finally arrived.

In my opinion, the sentence at the very end of this announcement article is more important than all the previous benchmark data:

"Not tempted by praise, not frightened by slander, follow the right path and correct yourself."

This is the only response from an organization to the outside world after fifteen months of speculation, doubt, and negative predictions. When read in a broader context, its subtext is probably: We know what we're doing, and we don't care what you say.

The answer sheet provided by V4 is indeed not a regular iteration.

In my opinion, the core significance of V4 does not lie in benchmarking - although V4-Pro achieved 90.2% on the Apex Shortlist and its Codeforces Rating reached 3206, which is a crushing performance among open-source models. The real dividing line lies in three numbers:

First, cost. In a 1M context, the single-token inference FLOPs of V4-Pro is only 27% of that of V3.2, and the KV Cache is only 10%; for V4-Flash, it's even more extreme, being pushed down to 10% and 7% respectively. This means that when the context expands from 128K to 1M, the theoretical load increases nearly 8 times, but the single-token computing power consumption actually decreases. In the AI industry, the improvement of capabilities usually comes at the cost of increased computing power. V4 breaks this rule. This reverse efficiency revolution makes many Agent scenarios that only existed in white papers suddenly economically feasible.

Second, chips. V4 runs entirely on domestic chips such as Huawei Ascend and Cambricon, and its technical architecture has shifted from CUDA to the CANN framework. This is the world's first trillion-parameter MoE model deployed on pure domestic computing power. As Jensen Huang said, "This is a bad result for the United States." V4 verifies one thing: Without the CUDA ecosystem, the underlying computing power cycle of Chinese AI can still work. The impact of this signal on the industrial chain is far more subversive than the benchmarking of the model itself.

Third, Agent. V4-Pro is the best among open-source models in the Agentic Coding evaluation, and its internal usage experience is better than Sonnet 4.5, with a delivery quality close to that of Opus 4.6 in non-thinking mode. At the same time, V4 has made special optimizations for mainstream Agent frameworks such as Claude Code, OpenClaw, and CodeBuddy - this is not a model that "can chat", but a model that "can do work". Starting from V4, DeepSeek's positioning has clearly shifted to Agent infrastructure.

These three signals together point to a more fundamental judgment: After V4, the singularity of the Cambrian explosion of Chinese AI applications has arrived.

01 Singularity

This judgment needs explanation.

540 million years ago, the Cambrian explosion of life occurred. In a geological sense, it was almost instantaneous, and a large number of animal phyla with various forms emerged in the ocean. More precisely, identifiable animal fossil assemblages suddenly appeared in the fossil record. There is a consensus in the academic community that the premise of the Cambrian explosion is not a single factor, but the simultaneous satisfaction of multiple conditions such as oxygen concentration, ocean chemistry, ecological niche vacancy, and the evolution of Hox genes. The sudden leap in species diversity is because the underlying environment has reached a critical threshold.

Today, the underlying environment of the AI industry is reaching the same kind of threshold.

First is the cost threshold. The pricing of V4-Flash is 1 yuan per million token inputs (when the cache is not hit) and 2 yuan per million token outputs. V4 is fully compatible with domestic chips, basically proving the feasibility of not overly relying on NVIDIA's high-end GPUs on the inference side in the Agent era.

This logic means that a developer can process a context the size of "The Three-Body Problem" with just a few dollars. When the cost drops to this level, the application scenarios will change from "what can be done" to "why not give it a try", which is the real foundation for the implementation of Agents.

Second is the performance threshold. In the Agentic Coding evaluation, V4-Pro is already the strongest among open-source models, and its internal usage experience is better than Sonnet 4.5. Under the 1M context setting, the single-token inference FLOPs of V4-Pro is only 27% of that of V3.2, and this efficiency breakthrough leads globally. In mathematical, STEM, and competitive code evaluations, V4-Pro surpasses all publicly evaluated open-source models and is comparable to the world's top closed-source models.

This basic fact means that the "intelligence density" of the model, the effective intelligence generated per unit of computing power, has crossed a critical point.

Finally, there is the toolchain threshold. V4 has made special optimizations for mainstream Agent frameworks such as Claude Code, OpenClaw, and CodeBuddy, and has improved in aspects such as code tasks and document generation tasks. A million-context has become the standard for all official services.

This is exactly the prerequisite for Agents to work autonomously for a long time. It is no longer a "toy" and can be directly deployed in the production environment.

The three thresholds are broken through simultaneously: the cost is low enough for large-scale deployment, the performance is strong enough to be competent, and the ecosystem is ready for implementation. This is not a linear improvement, but a phase change.

An interesting comparison is that in the V4 technical report, the official admitted that the model's capabilities still lag behind GPT-5.4 and Gemini-3.1-Pro by about 3 to 6 months. This statement actually shows that the significance of V4 does not lie in catching up with anyone, but in laying a solid foundation for "basic intelligence". Once the foundation is established, the application layer above will emerge spontaneously.

Historical experience is straightforward: Every qualitative change in the underlying infrastructure will trigger a Cambrian explosion in the application layer. Amazon Web Services pushed the computing cost below the threshold, triggering a global SaaS entrepreneurship wave; when the 4G tariff dropped below the threshold, it triggered the era of short videos and live e-commerce. Today, DeepSeek V4 is pushing the cost of basic intelligence below the same kind of threshold, and this time, it is subverting intelligence itself.

02 Earthquake

Let's take a closer look at the first signals of this Cambrian explosion.

What deserves the most attention is the reconstruction speed of the industrial chain: After V4 was adapted to Huawei Ascend 950PR, domestic chip companies such as Cambricon, Hygon Information, and Moore Threads accelerated their adaptation synchronously, and giants such as Alibaba, ByteDance, and Tencent increased their procurement of Ascend chips.

This is not just the release of a model company, but the start of an entire domestic computing power industrial chain.

The chain reaction in the application layer is also intense. The most direct impact is the qualitative change in the economics of Agents. Based on the API cost of V4-Flash for 1 million tokens, the budget for a task that requires a complete reading of a medium-sized code repository is only a few dollars. Under this cost structure, "letting Agents make mistakes" first becomes engineeringly reasonable, and the economic foundation for large-scale deployment of Agents has been established.

In addition, the price of V4-Flash starting from 0.2 yuan per million token inputs has almost pushed AI inference into a new "water, electricity, gas, and chemical" stage. When the marginal cost of intelligence approaches zero, the entire application layer, including customer service, e-commerce, education, medical care, and law, will be redefined at an unimaginable speed.

There is also an easily overlooked signal: Before the release of V4, DeepSeek launched external financing for the first time, and it is said that the latest target valuation exceeds $20 billion. In my opinion, this is more like a signal that an institution is preparing ammunition for large-scale deployment after completing the construction of core infrastructure.

03 Paradigm Shift

Regarding the industrial impact of V4, when placed in a larger coordinate system, a deeper change is taking shape: Chinese AI is shifting from "catching up with model capabilities" to "ecological cycle", and the model, chip, and application are forming a positive feedback loop.

V4 was the first to prove the feasibility of domestic chips carrying trillion-parameter models, directly driving the synchronous growth of companies such as Cambricon, Hygon, and Moore Threads. After these chip companies have market verification, they will be more confident in investing in the R & D of next-generation products.

The next-generation chips will have stronger performance and lower cost, which in turn will reduce the model inference cost and give rise to more developers and application scenarios. The expansion of application scenarios will generate more data and feedback, further promoting the improvement of model capabilities. The "model - chip - cloud" closed loop in China is moving from "logically established" to "factually established".

At the chip ecosystem level, although NVIDIA and its CUDA are still the best or even the only option in the pre-training stage for global competition,

but after V4 became the first trillion-level model that does not rely on NVIDIA CUDA on the inference side, a turning signal has emerged: Chinese AI is moving from single-point breakthrough to systematic generational evolution. The new narrative should be: Chinese AI is establishing a complete technical closed loop from chips to models to applications, independent of the NVIDIA ecosystem.

Once this closed loop works, its significance far exceeds the capability breakthrough of any single model.

DeepSeek V4 is becoming an infrastructure with significance. Its open-source ecosystem, cost pricing strategy, and chip adaptation path are about to reshape the pattern of the entire Chinese AI application ecosystem.

04 Cambrian

Let's broaden our perspective.

In the past few years, the AI industry has been answering a question: Where is the upper limit of model capabilities? From GPT-4 to GPT-5.4, from Claude to Gemini, everyone has been sprinting along the single path of "larger parameters, higher intelligence". But there is a blind spot in this framework: When the model intelligence reaches a certain level, what determines the industrial pattern is no longer "whose model is smarter", but "whose model can be deployed more widely and used more deeply".

The emergence of V4 is shifting the focus of competition in the AI industry from "capability competition" to "ecosystem competition".

This is not the victory of a single model, but the victory of the open-source ecosystem over the closed-source barriers, the victory of cost reconstruction over the computing power threshold, the victory of the domestic technology stack over the technology monopoly, and the victory of developers and users over the pricing power of a few giants.

DeepSeek V4 is open-sourced under the MIT license, which means that any developer in the world can deploy it locally, use it commercially, and conduct secondary development freely. This level of openness is accelerating the collapse of the moats of closed-source giants.

05 Conclusion

"Not tempted by praise, not frightened by slander."

In an industry full of noise and games, there is a scarce ability: When everyone is doubting, keep silent and continue writing code; when everyone is making negative predictions, open the terminal and continue training the next version. V4 was fifteen months late.

But these fifteen months were not wasted. They were used to migrate from CUDA to CANN, to increase the context from 128K to 1M while reducing the cost, and to gradually approach the global first echelon in terms of Agent capabilities.

These silent engineering efforts will not appear in any eye-catching reports. But they are becoming the cornerstone of AI applications in the next decade.

You will find that the most fertile soil for Chinese AI applications is ready. A great explosion of intelligent species is accumulating energy beneath the surface.

This article is from the WeChat official account "Silicon-based Starlight", author: Yuantai. Republished by 36Kr with permission.