DeepSeek has permanently reduced its prices, and the first winner has emerged.
Last Friday, DeepSeek announced that the 25% discount on its API has been changed from temporary to permanent.
For developers, the price remains the same, but the term of the privilege has changed from one month to indefinite. Global users have gone into a frenzy. However, price is just the surface. The real variable worth paying attention to lies elsewhere: a programming agent named Reasonix is going viral on GitHub.
Its logic is extremely straightforward: it is only compatible with DeepSeek. Through extreme engineering optimization, it reduces the usage cost by another 80%.
Two threads, one explicit and one implicit, are unfolding simultaneously. How does Reasonix leverage the underlying features of DeepSeek to achieve a dimensionality reduction strike? Why is the engineering combination of "model + agent" replacing the simple model performance? These are the questions that need to be dissected.
01
"Prefix Caching" and "Byte Fingerprint"
Let's start with the term "Prefix Caching", which is an optimization technology for large language model inference that has been widely adopted since last year.
The core idea is very simple: cache the KV Cache in historical conversations so that subsequent requests can directly reuse these intermediate results, thereby significantly reducing the generation latency of the first token and improving inference efficiency.
The technical details are a bit cumbersome, so most developers only have a perceptual understanding of DeepSeek's prefix caching as "saving money". However, the development team of Reasonix has grasped the essence at the physical level: Byte-stable.
To understand Reasonix, you first need to understand the logic of DeepSeek's caching: Prefix Hash.
Imagine that the prompt sent by the user is just a very long string of numbers in front of the machine. The hash algorithm will issue a "unique digital signature" for the text corresponding to this string of numbers, which is called a "fingerprint". As long as the fingerprint of the content sent by the user matches the fingerprint of the content cached on the server, there is no need to recalculate this part of the content, and the cost can be discounted by 20%.
But as we all know, everyone's fingerprints are different, and this caching logic also has a fatal flaw: it requires the conversation content to be completely identical from the beginning, word for word.
The design ideas of most programming agents on the market are based on the "no-cache era", and there is only one optimization goal, which is to minimize the total number of tokens sent.
Therefore, to save money, these agents will dynamically compress historical conversations and delete useless intermediate reasoning processes. Or, to make the model more focused, they will rearrange the position of system prompts in each round of conversation.
However, these seemingly smart optimization actions disrupt the continuity of the prefix. Once a minor change breaks the "complete match", millions of token caches that could have been hit will instantly disappear. This is a typical case of "losing the watermelon to pick up the sesame", sacrificing 10,000 token caches to save the length of 100 tokens.
The solution adopted by Reasonix may seem a bit clumsy from a traditional perspective, which can be called the "Append-Only Loop" mode.
Simply put, it always adheres to an ironclad rule in the model's operating cycle: do not rearrange, compress, or modify history. Whether it is the result of tool calls or user feedback, everything is appended to the end like a running account. This seemingly clumsy approach results in the context sent becoming longer and longer as the conversation progresses.
However, a genius result follows. Since the prefix remains unchanged, this extremely long context can always be "remembered" by the model. Even in programming sessions lasting several hours, the cache hit rate of Reasonix paired with DeepSeek V4 still remains above 94%. In an extreme actual test case on GitHub Projects, the hit rate even reached a terrifying 99.82%.
Therefore, this is an extremely precise mathematical calculation: in an environment where the cost of cache hits in DeepSeek is so low that it can be ignored, the marginal cost of retaining a long context is far lower than the cold start cost of re-injecting after destroying the cache.
02
The Recycling Mechanism of the Thought Chain
Since it is a programming agent specifically developed for DeepSeek, not only the newly released V4 but also the old R1 model can enjoy the benefits.
R1 is the previous generation of inference model. Its most well-known feature is that it displays a thought chain of thousands of words within the <think> tag. However, in actual engineering, this "reasoning first" mode poses two major challenges to the agent: thought leakage and syntax distortion.
As the name suggests, thought leakage means that R1 sometimes shows a strong "desire to execute" during the thinking process. If the agent selects the R1 model, it should initiate the corresponding tool call instruction after the thinking is completed. However, due to the long reasoning chain, it often writes various tool call instructions within the thought chain.
For most agents, they can only recognize the officially defined Tool Call block. The "premature" instructions in the thought chain of the model will be ignored as ordinary plain text, which may seriously cause the conversation to get stuck.
Reasonix has designed a real-time scanning mechanism for this. Even if the tool call instruction escapes into the thought chain, Reasonix can accurately identify it and retrieve it for rescheduling and execution.
This not only improves the scheduling efficiency by 38%, but more importantly, it saves the expensive reasoning token cost. The model no longer needs to rethink due to minor chaos in the thought chain.
Syntax distortion is also easy to understand. Even if the model correctly initiates a tool call, the fragility of the JSON format is still a nightmare for the agent. Whether there is an extra comma or a missing quotation mark in the model output will cause the agent to stop.
In the aforementioned "Append-Only Loop" mode, if a tool call fails due to a syntax error, the agent has to feedback the error message to the model, and the model will regenerate the logic accordingly. During this process, multiple losses have quietly occurred: the error message pollutes the context, the regenerated response destroys the certainty of the fingerprint, and the cache advantage is greatly reduced.
Therefore, Reasonix has adopted a "self-healing" solution: before the instruction is sent to the executor, it must undergo a round of self-repair with perception constraints by Reasonix. This is like a senior programmer fixing bugs, automatically completing missing symbols, correcting formats, and rearranging fields.
After the repair, the failure rate of tool execution has also dropped below 3%. In this way, the conversation history becomes "clean" and correct, and the prefix cache can continue to accumulate like a snowball.
03
The Hegemony of the Passive Ecosystem
Going back to the origin of the matter, the permanent price cut of DeepSeek is a programming carnival for developers, but it is like a bolt from the blue for competing enterprises.
A not very rigorous but extremely cruel business formula has emerged:
The dominance of an AI product = (the native ability of the model + the completion by the community engineering) / the user migration cost.
Obviously, in today's AI industry, if a model's performance reaches more than 90% of its competitors' and its price is only 1/10 of theirs, a devastating substitution effect will naturally occur.
Recently, there have been the Baidu AI Developer Conference and the Alibaba Cloud Summit at home, and Google I/O 2026 abroad. All these enterprises are trying to integrate their various AI products into a unified entrance and build an insurmountable ecological barrier.
In contrast, DeepSeek does not have cloud platform services like Baidu Cloud and Alibaba Cloud, nor does it have Google's globally distributed YouTube and Gmail, and it even does not have multi-modal functions.
However, it has successfully proven a logic that is respected by global developers: maintain its capabilities in the first echelon in the country, implement the cost-effectiveness to the extreme, and the usage volume will naturally come. The remaining functions will be supplemented and improved by the open-source community.
In the past, large enterprises always thought that the ecosystem was built from top to bottom. We have witnessed the "walled garden" scenario in Doubao Mobile Assistant and Qianwen APP in the early days of the agent era.
Reasonix has proven the power of the passive ecosystem. It is not a commercial product like Claude Code and Codex, but a strong fortress built spontaneously by developers specifically for DeepSeek.
Why are developers willing to write a set of operation optimization logic specifically for DeepSeek? The answer is simple, because DeepSeek has left enough room for global developers to benefit. Facing those expensive models at home and abroad, engineering optimization at the developer level cannot offset the cost caused by token consumption. However, with DeepSeek, every optimization can be directly transformed into "freedom to experiment" for developers.
This is the power reversal brought about by open source.
We admit that DeepSeek still has a gap compared with the world's top models. However, when the API price of the model is cheap enough, V4 has evolved from a model into a popular AI infrastructure, and the community will spontaneously make up for its shortcomings. The team under Liang Wenfeng may not have time to do the ultimate TUI, but there will always be a team like Reasonix leading the "actuaries" to quickly fill the gap.
This interest-driven ecosystem evolves much faster than the all-in-one products within large enterprises.
04
The Displacement of the Evaluation System
Finally, domestic AI can also hold its head high and join the competition of agent programming.
If we can't use Opus 4.7 running in Claude Code and GPT - 5.5 running in Codex abroad, we can use DeepSeek V4 running in Reasonix.
While feeling happy and proud, an easily overlooked pattern is changing: the current competition in the AI field has evolved into a competition of "model + coding agent".
Many AI manufacturers at home and abroad tend to cram all functions into a user interface, but Reasonix has chosen a vertical route like Claude Code: only focus on programming and go deep into the terminal. It has not participated in the involution of IDE plugins, but has developed its own cell-diff renderer based on Yoga. Although the R & D team has provided a desktop version with a lower threshold, there is no doubt that the focus is on achieving the most extreme interaction in the terminal.
In the evaluation system of Artificial Analysis, efficiency and cost have become the core weights.
There is no need to mention how high the price of the product combinations of Anthropic and OpenAI is. A monthly subscription fee of $20 often cannot meet the needs of developers. However, if you use the combination of Reasonix + DeepSeek, using 400 million tokens only costs $12 (according to the international version of DeepSeek's charging standard).
This extreme low cost not only brings freedom to experiment, but also the prosperity of the multi - agent collaboration ecosystem. Users can generate task execution plans in batches without worrying about the bill skyrocketing. This psychological liberation provides the possibility for AI to truly enter large - scale productivity.
The emergence of Reasonix is a sign that the agent field is shifting from showy skills to precise calculations. The competition in the AI era has sunk to the cache fingerprint of each byte and the error correction of each tool call.
DeepSeek has turned computing power and wisdom into cheap tap water that everyone can use. And Reasonix has become the first faucet with high efficiency and low loss.
This article is from the WeChat official account "Silicon - based Starlight", author: Si Qi. Republished by 36Kr with authorization.