HomeArticle

The 10 amazing Easter eggs hidden in the DeepSeek-V4 technical report, and the "alchemy metaphysics" is also written in the paper.

盒饭财经2026-04-27 10:23
DeepSeek has reached an extreme level in "saving money" and "saving resources".

Finally, DeepSeek-V4 has arrived.

On April 24th, the official DeepSeek account published an article titled "DeepSeek-V4 Preview Version: Entering the Era of Universal Access to Million-Context". In the article, it was officially announced that "the preview version of the new series of models, DeepSeek-V4, is officially launched and open-sourced simultaneously."

Meanwhile, it was also introduced that DeepSeek-V4 has an ultra-long context of one million words and leads in the domestic and open-source fields in terms of Agent capabilities, world knowledge, and reasoning performance. The model is divided into two versions according to size:

After the release, there has been extensive evaluation and discussion, so there is no need to elaborate further.

HeFan Finance noticed that DeepSeek simultaneously released a technical report on DeepSeek-V4. The address is as follows: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

This technical report titled "DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence" consists of 55 pages and introduces V4 from six aspects, including architecture, general infrastructure, pre-training, and post-training. There are 10 interesting little Easter eggs hidden in this highly professional technical report.

Easter Egg 1: "Think Max" Mode, a "Pressuring" Instruction That Allows No Shortcuts

Location: Page 30, Table 3

The original text is:

Reasoning Effort: Absolute maximum with no shortcuts permitted. You MUST be very thorough in your thinking... rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios.

Translated, it roughly means:

Reasoning commitment: Absolutely maximized, no shortcuts allowed. Your thinking must be extremely thorough, comprehensively deconstructing the problem to reach the root cause, and rigorously stress-testing your logic against all possible paths, edge cases, and adversarial scenarios. Clearly write out the complete process of in - depth thinking, record every intermediate step, considered alternatives, and rejected hypotheses to ensure that there are absolutely no unexamined presuppositions.

This passage is the "System Prompt" secretly inserted into the large model by the background when the model activates the Think Max (Extreme Thinking Mode). It is written with a strong sense of oppression, like a strict tutor forcing students to exhaust their mental power without any slack.

DeepSeek has set a set of extremely strict system prompts for it. The wording is very oppressive, and all absolute imperatives are used: "Absolutely maximized", "No shortcuts allowed", "Must be thorough", "Rigorously stress - test", "Don't overlook any hypothesis". It also explicitly commands the model to "Prohibit shortcuts" and requires recording every rejected hypothesis and intermediate step.

Through this extremely strict engineering - style prompt, it squeezes the computing power of the large model in the 1M Context (Million - Context) to verify code and logical errors. This is like putting a "logical spell" on the model to ensure that when dealing with complex logic or code, the model will not ignore details in pursuit of speed.

Easter Egg 2: An "Open Letter" to Hardware Manufacturers: Stop Wasting Efforts on Bandwidth

Location: Page 16, Section 3.1

The original text is:

Once bandwidth meets this threshold, it ceases to be the bottleneck, and devoting additional silicon area to further bandwidth brings diminishing returns. We encourage future hardware designs to target such balance points rather than scale bandwidth unconditionally.

It means:

Once the bandwidth reaches this threshold, it is no longer a bottleneck. At this time, using more chip area to further increase the bandwidth will lead to diminishing marginal returns. We encourage future hardware designs to aim at such balance points rather than unconditionally expand the bandwidth.

DeepSeek took the initiative in the report and gave "prescriptions" to hardware manufacturers such as NVIDIA and Huawei. It decently expressed their views on hardware: Blindly increasing bandwidth has limited improvement on the current AI training efficiency. It is recommended that manufacturers allocate the chip area to places that can better improve the computing - communication ratio.

Easter Egg 3: Extreme Efficiency, Only 10% of the Cache of V3.2 at 1M Length

Location: Abstract

The original text:

In the one - million - token context setting, DeepSeekV4 - Pro requires only 27% of single - token inference FLOPs and 10% of KV cache compared with DeepSeek - V3.2.

It means:

In the one - million - token context setting, compared with DeepSeek - V3.2, DeepSeek - V4 - Pro only requires 27% of its single - token inference FLOPs and 10% of the KV cache.

DeepSeek has reached an extreme level in "saving money" and "saving resources".

Through CSA (Compressed Sparse Attention) and HCA (Heavily Compressed Attention) technologies, when processing long texts of one million words, the memory it occupies is only one - tenth of that of the previous version. This means that in the future, it will be possible for personal computers and even mobile phones to run million - length text analysis.

Easter Egg 4: Frank "Alchemy Metaphysics": Knowing the What but Not the Why

Location: Page 26, Section 4.2.3

The original text is:

Although a comprehensive theoretical understanding of their underlying mechanisms remains an open question for now, we are sharing them openly to foster further exploration by the community.

It means: Although a comprehensive theoretical understanding of its underlying mechanisms remains an open question for now, we are sharing it openly to promote further exploration by the community.

In the section of Mitigating Training Instability, the DeepSeek team shared two unique techniques for solving the training collapse of trillion - parameter models, Anticipatory Routing and SwiGLU Clamping.

In the technical report, they also very straightforwardly admitted: This kind of frankness of "Although I don't know the principle, but it really works, so take it and use it" is a true portrayal of the AI "alchemy" world and shows a very open - source spirit.

Easter Egg 5: Special Tokens for "Quick Instruction"

Location: Page 33, Table 5

<|action|> (Judge whether to search the web), <|title|> (Generate a title), <|query|> (Generate a search term).

In order to make the Chatbot respond faster, DeepSeek implanted a series of special tokens "ciphers" inside the model.

The reason why V4 is so fast is that it directly reuses the pre - calculated long - text KV Cache. There is no need to feed hundreds of thousands of words to another small model for judgment as before, thus completely eliminating "redundant prefilling", and the user's waiting time can be significantly shortened.

Easter Egg 6: Ranked 23rd Globally on Codeforces

Location: Page 39, Section 5.3.2

The original text is: On the Codeforces leaderboard, DeepSeek - V4 - Pro - Max currently ranks 23rd among human candidates.

This sentence means that on the Codeforces leaderboard, DeepSeek - V4 - Pro - Max currently ranks 23rd among human participants.

This "Easter egg" is of high value. In the Codeforces ranking of the world's top programming competitions participated in by pure humans, the estimated score of DeepSeek - V4 (3206 points) is enough to rank 23rd globally. This means that it has surpassed most top programmers and entered the top echelon of human programming intelligence.

Easter Egg 7: An Internal "Employee Survey", 52% Can't Do Without It

Location: Page 44, Section 5.4.4

The original text is:

In a survey asking DeepSeek developers and researchers (𝑁 = 85) — all with experience of using DeepSeek - V4 - Pro for agentic coding in their daily work — whether DeepSeek - V4 - Pro is ready to serve as their default and primary coding model compared to other frontier models, 52% said yes, 39% leaned toward yes, and fewer than 9% said no.

Translated, it is:

In a survey of DeepSeek developers and researchers (N = 85), all of whom have experience using DeepSeek - V4 - Pro for agentic coding in their daily work. When asked whether DeepSeek - V4 - Pro is ready to be their default and primary programming model compared to other frontier models, 52% gave an affirmative answer, 39% were inclined to an affirmative answer, and less than 9% gave a negative answer.

DeepSeek very rarely publicly disclosed the real feedback of 85 top researchers within the company. More than half of the core internal staff at DeepSeek have made it their daily preferred programming tool. This kind of "eating one's own dog food" behavior can better illustrate the situation of the model in actual production than benchmark data.

Easter Egg 8: Real "Complaints" from Internal Employees Are Written into the Technical Report

Location: Page 44, Section 5.4.4

The original text:

Respondents find DeepSeek - V4 - Pro to deliver satisfactory results across most tasks, but note trivial mistakes, misinterpretation of vague prompts, and occasional over - thinking.

Translated, it is:

Respondents believe that DeepSeek - V4 - Pro can give satisfactory results in most tasks, but also point out that it has some minor errors, misinterpretation of vague prompts, and occasional over - thinking.

This sentence is right next to the previous "internal employee survey" Easter egg, and DeepSeek chose to write in the complaints of internal employees.

Easter Egg 9: Down - to - Earth "Chinese - Style" Evaluation Questions

Location: Page 43, Figure 13

In order to demonstrate the model's ability in complex long - text white - collar work, the sample tasks released by DeepSeek are very down - to - earth.

"Write a co - marketing plan for a well - known milk tea brand and the Beijing Subway" and "UGC communication and social fission design". Compared with foreign large models that test writing all - English Shakespeare poems, DeepSeek's evaluation questions really understand the daily PPT needs of domestic office workers.

Easter Egg 10: The Mysterious Tester Dolly Deng in the Acknowledgment List

Location: Page 55, Appendix A.2 Acknowledgment

In the Acknowledgment section of Appendix A.2, in addition to all the authors, the team specifically thanked a non - author individual: "We would like to thank Dolly Deng and other testers for their valuable suggestions