HomeArticle

Liang Wenfeng draws an arrow, and Jensen Huang is uneasy.

盒饭财经2026-04-02 12:06
The "arrow" is on the string, but it hasn't been shot for a long time.
The provided text is already in English, so the result is the same as the original text:

Without saying a word, the whole internet is constantly keeping an eye on it.

From the evening of March 29th to the morning of the 30th, DeepSeek experienced a large-scale service outage. According to the Global Times, starting at 22:00 on the 29th, both the web version and the App became completely unresponsive, with frequent pop-ups of the "Server is busy" message, and related functions could not be used normally. By the early morning of the 30th, some users still reported that they could not use it properly.

After that, the topic "DeepSeek is down" quickly climbed into the top ten hot search lists on multiple platforms such as Baidu, Weibo, and Toutiao. The overseas tech circle is also paying attention. In addition to media reports, many professional users have conducted actual tests, comparing the execution results of the same task before and after.

In contrast to the high level of attention and speculation from the outside world, DeepSeek's official side did not provide any progress reports or explanations. On the morning of March 30th, the official website issued an announcement: [Resolved] Abnormal performance of DeepSeek's web/APP, and the service status shows "normal".

In the first half of March, speculation was also triggered by a mysterious model called Hunter Alpha that appeared on OpenRouter. At that time, many developers thought it might be a stealth beta version of V4. Later, it was proven that this mysterious model was not V4, but an internal test version of Xiaomi's flagship model, MiMo-V2-Pro. And DeepSeek did not comment on this speculation.

The subtle tension between movement and stillness comes from Liang Wenfeng: the "arrow" is on the string, but it hasn't been shot for a long time.

In late 2024, V3 was released, and then in early 2025, R1 was launched. DeepSeek instantly caught up with giants like OpenAI and topped the App Store charts in countries such as China and the United States. Beyond product technology, due to its extremely low computing power cost, it caused a significant shock in the US semiconductor stock market. Now, the industry is holding its breath, waiting for Liang Wenfeng's next big move - DeepSeek V4. However, the V4, which was originally expected to be launched in the first quarter of this year, has been repeatedly postponed.

The speculation about the release time of DeepSeek V4 by the outside world has been continuously pushed back, from February, around the Spring Festival, early March, to as early as April. In addition, the positioning, architecture, performance, context window, pricing, supply chain, etc. of V4 are also under high attention, with continuous rumors.

Among them, a report related to the supply chain has triggered various speculations beyond technology. According to Reuters, two sources familiar with the situation said that DeepSeek did not show its upcoming flagship model to US chip manufacturers before the upcoming major model update, which breaks the industry's standard practice.

These few lines of text reveal a "cutting-off-the-source" link.

Nvidia can support its trillion-dollar market value not only by its GPU hardware itself, but also by its CUDA software ecosystem that has been polished for more than a decade. CUDA is like a well-paved road for global AI developers. As long as you follow it, you can always "achieve miracles with brute force". But if, as revealed in the above report, what DeepSeek is going to do is to pave a "CUDA-bypassing" highway.

What's more dramatic is that judging from the papers and open-source projects successively released by DeepSeek-related parties from December 2025 to the present, these speculations are not groundless.

1

Looking for changes with a magnifying glass

On the evening of March 29th local time, a user named "AiBattle" on the X platform posted a tweet.

The DeepSeek model that they serve on the WEB/APP may have been updated again

The model does seem to consistently identify itself as V3 now

The zero-shot coding outputs I’m getting now also seem different in style from the ones I got a few days ago

It needs more testing to be completely sure

Translated, it roughly means: The DeepSeek model on the web and APP may have been updated again. Now, this model seems to always identify itself as the V3 version. After some tests, it is found that the zero-shot ability has greatly increased, and the output style is different from that of a few days ago. But more tests are needed to fully confirm the conclusion.

The accompanying picture is a before-and-after comparison of two pelicans riding bicycles.

Through this comparison picture, it is obvious that its spatial and graphic code capabilities have been greatly improved. In terms of picture composition, color matching, and element logic, it visibly outperforms the version from a week ago. As of 18:47 on March 31st, this tweet had accumulated 162,800 views.

Using SVG (Scalable Vector Graphics) to draw a pelican riding a bicycle is often regarded as an extreme test question for the spatial and rendering capabilities of large models.

This test question comes from Simon Willison, a globally renowned open-source developer and co-founder of the Django framework. He believes that the data on the leaderboards of large models these days is full of moisture. SVG is essentially composed of countless coordinates, curve formulas, and color codes, which are pure codes. Asking a "pure-text AI" without real eyes and hands to precisely depict the biological characteristics of a "pelican" and the mechanical structure of a "bicycle" with code can directly expose the spatial imagination and code logic capabilities of large models.

One day later, the user supplemented and released the results of "more tests".

"AiBattle" said that after a 7-hour outage, DeepSeek may have modified the model again. Before the outage, the model identified itself as the V3 version. Now it has changed back to claiming to be the "latest version". The quality of the SVG also seems to have become worse, returning to the previous state.

Developers like "AiBattle" are not in the minority. The AI circle seems to be "left on a cliffhanger" by DeepSeek. They are using magnifying glasses, trying to find clues to prove that V4 is on the verge of release.

For example, they found that the knowledge base cut-off date may have been quietly postponed. Some users found that without enabling online search, DeepSeek knows the results of the 2025 US elections, but knows nothing about the major events in February 2026. This makes the outside world speculate that the knowledge cut-off date of the new version may be January 2026.

For example, in terms of context tokens. On February 11th, DeepSeek quietly expanded the context window of the existing model from 128K to 1M tokens and updated the knowledge cut-off to May 2025. Many people in the community understand this as the pre-launch testing of the relevant infrastructure for V4.

Underlying technology papers are often the trailers and instruction manuals for the next generation of large models.

Compared with the speculations from the outside world and the tests in the community, what is more certain is the papers and open-source projects released by DeepSeek since the end of 2025.

On December 31st, 2025, Liang Wenfeng uploaded and published a paper titled "mHC: Manifold-Constrained Hyper-Connections".

This paper solves the problem of training collapse caused by the exponential amplification of signals (up to 3000×) in large-scale training of traditional Hyper-Connections. By projecting the residual space of HC onto a specific manifold, it restores the identity mapping property and ensures information conservation.

Liang Wenfeng's name appears in the author list of the paper.

In January 2026, DeepSeek released a research result called "Engram" on GitHub and simultaneously uploaded a paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models".

This research result is known in the industry as a magic tool for "replacing the rote memorization of large models with hash tables".

On February 26th, DeepSeek jointly with Peking University and Tsinghua University released a latest inference architecture paper. The title of the paper is "DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference", which solves the framework problem of the long-text throughput bottleneck of agents.

These may be the three sharp arrows prepared by Liang Wenfeng.

2

Aiming at the target, trying to squeeze every bit of performance out of the hardware

In today's large model arms race, top AI companies publishing underlying technology papers is no longer the traditional "show of strength".

These three arrows are naturally aimed at specific targets.

On the surface, the three technologies of mHC, Engram, and DualPath belong to three completely different fields: algorithm mathematics, model architecture, and system engineering. But if you put them together, you will find that they are not isolated academic papers.

First, look at mHC (Manifold-Constrained Hyper-Connections). It's like a stable skeleton, solving the problem of "being able to train".

When the model parameters soar to hundreds of billions or even trillions, traditional residual connections will become a "narrow gate" for information flow. But randomly adding cross-layer connections will lead to training collapse. mHC constrains these connections on a specific mathematical manifold (doubly stochastic matrix), ensuring that ultra-large models can still be stably trained under extremely deep and wide architectures.

How to understand it more straightforwardly?

This new type of connection structure is like spending a little money to get high efficiency from the team. It's like slightly adjusting the "communication mechanism between departments" inside the AI. Although it increases the communication cost by 6 - 7%, it ensures that the whole model will no longer have "coordination chaos" problems during learning, making it more stable and smarter.

And Engram (conditional memory module) is like an external hard drive, solving the problem of "remembering and being smart enough".

Based on the ultra-large model skeleton built by mHC, Engram starts to partition the "brain". In the past, large models rote memorized all knowledge in expensive neural network weights, causing "inference calculation" and "memory" to compete for resources. Engram introduces a new type of sparsity, packaging static knowledge into hash tables and offloading them to cheap CPU memory. Through O(1) ultra-fast lookup, it releases 100% of the extremely precious GPU computing power, which is then dedicated to complex logical reasoning.

This sparse attention is like enabling large models to learn the abilities of "skimming through" and "grasping the key points". Originally, when an AI read a long article, it had to read every word carefully, even if it was nonsense. After the AI gets the "skimming through" buff, its reading speed of long articles doubles directly, greatly reducing the computational pressure.

To understand it more straightforwardly, this kind of knowledge storage is similar to changing rote memorization into looking up a dictionary. In the past, in order to remember who wrote a certain book or what the capital of a certain country is, AI needed to consume a large amount of computing power in its "brain" to memorize it. Now, DeepSeek's approach is to take out this "fixed knowledge" and make it into a "dictionary". When the AI encounters such problems, it can directly "look up the dictionary" without wasting mental power, saving all the computing power for "logical reasoning" and "thinking".

DualPath (dual-path inference framework) is an efficient logistics system, solving the problems of "being affordable and running fast".

After the model is trained and the "brain" becomes smarter, when it acts as an agent to process long texts of hundreds of thousands of words and multi-round code tasks, it will generate a large amount of context cache (KV-Cache). At this time, calculation is no longer the bottleneck, and the I/O bandwidth of "reading hard drive data to the graphics card" becomes the bottleneck. DualPath cleverly calls the network cards of the originally idle decoding nodes (Decode) in the cluster to help the prefill nodes (Prefill) move data, increasing the end-to-end throughput by nearly 2 times.

To put it simply, when AI is dealing with extremely long tasks, it's not that its "brain" is not capable enough, but that the slow speed of its "hands and feet" in moving data is holding it back. The DualPath technology is like a smart workshop director, mobilizing the idle "transport vehicles" from elsewhere to help move data, directly doubling the overall work efficiency.

Theoretically, mHC aims at the "CUDA ecological wall", proving that non-Nvidia underlying hardware combined with extreme mathematical communication optimization can still run trillion-parameter large models; Engram aims at "video memory anxiety", kicking out fixed knowledge from the GPU and significantly reducing the hardware threshold for inference; DualPath targets the "Agent throughput bottleneck", significantly improving the concurrent processing ability of large models.

Although these three technologies seem to solve different problems, their underlying technological beliefs are completely the same: Don't blindly believe in the stacking of computing power. Through extreme decoupling, squeeze every bit of performance out of the hardware.

However, this is not in line with the Scaling Law believed in by Silicon Valley. The core of the Scaling Law is "achieve miracles with brute force", using more and more advanced chips to train smarter large models.

Under this path, large models are getting bigger and bigger, and the demand for computing power