DeepSeek Resists Bowing to Capital

Stay true to the original aspiration, and the journey of wisdom is long.

On April 17th, The Information reported that DeepSeek is in talks with some institutions for financing, aiming to raise at least $300 million, and its valuation exceeds $10 billion.

The news spread like wildfire and caused quite a stir.

Liang Wenfeng and his DeepSeek have always been a special presence in the domestic AI field. While others are eager to be invested in, he has always refused financing and excessive commercialization and declined the contact intentions of many investors.

Therefore, when this financing news spread, many people's reactions were very intuitive. Some voices believed that Liang Wenfeng could no longer hold on, that DeepSeek was short of money and starting to attempt a commercial transformation, and that the once proud idealist had finally bowed to capital and reality.

This conclusion seems straightforward, but unfortunately, it's wrong.

Because it uses the logic of the previous stage to explain an industry that has entered the next stage.

Magic Square's Ammunition

Let's first answer the most intuitive question: Is DeepSeek really short of money?

On the surface, it's not.

In 2025, Magic Square Quantitative, the company behind DeepSeek, achieved an average return rate of 56.6%, ranking second among quantitative private equity firms with a management scale of over 10 billion in the country, second only to Lingjun Investment.

According to the widely circulated industry estimate, with a management scale of 70 billion, a 1% management fee and a 20% performance commission, the annual income is about 5 billion RMB.

Considering Magic Square's operating costs, channel sharing, tax frictions and other factors, the actual funds that can be used for DeepSeek's R & D can still reach at least hundreds of millions of RMB.

This amount of ammunition is not small, and this money is relatively easy to obtain - Magic Square stopped external fundraising a few years ago, and Liang Wenfeng holds the majority of the equity.

DeepSeek's R & D funds come directly from Magic Square's R & D budget, without any interference from external shareholders, LPs or the board of directors. There's no need to show reports to anyone or be responsible for anyone's exit plan.

Now the contradiction is, why does a model backed by a stable cash - flow start to seek financing?

The Agent Account Book Has Changed

The answer may lie in the fact that the AI industry is undergoing a rapid paradigm shift, from the Chatbot that became popular three years ago to the current AI Agent craze.

In the past two years, the core narrative of large models was "train once, call repeatedly". Spending millions of dollars to train a model and then deploying it, users ask questions and the model answers, and one round - trip is an inference. The consumption of computing power is concentrated in the training stage, and the cost in the inference stage is relatively controllable.

But the logic of Agent is completely different. It executes a whole task chain, involving planning, tool - calling, interacting with the environment, and backtracking and correcting after failure.

The inference tokens consumed by an Agent to complete a complex task may be dozens or even hundreds of times that in the Chatbot era. The inference cost is approaching the training cost, and both are expanding exponentially.

Meanwhile, the number of model parameters continues to rise. In late 2024, the number of parameters of V3 was 671B, and the industry speculates that Claude Opus4.6 already has as many as 5T.

As the scale of model parameters at the industry forefront is crossing from the tens of billions to the trillions level. This means that the computing power, data and engineering complexity required for a single training are all rising steeply.

DeepSeek was able to train V3 with just over $5 million in the past, thanks to the extreme efficiency in methodology, including the MoE architecture, multi - head latent attention (MLA), and fine - grained expert routing. Each technological innovation squeezes the maximum performance out of limited computing power.

This is a strategy of winning by skill over brute force, but there is a premise: the stakes on the table cannot rise too fast.

Now the stakes have risen.

In the Agent era, model training is becoming a continuously iterative flywheel. The model needs to repeatedly test and make mistakes in complex environments, accumulate feedback, and improve itself. The demand for data, computing power and engineering resources is rising exponentially.

Magic Square's 5 - billion - RMB income was more than enough in the Chatbot era. But in the Agent era, it may not be sufficient.

The People Who Were Poached

The exponential rise in computing power demand is an obvious problem, while the talent drain is a more painful hidden problem.

According to LatePost, since the great success of R1, at least 5 core R & D members of DeepSeek have left. They cover four core technical lines: base model, inference, OCR and multi - modality:

Wang Bingxuan, the core author of the first - generation large - language model, went to Tencent. Luo Fuli, a key contributor to the V3 model, went to Xiaomi to be the head of the AI department. Guo Daya, a core researcher of R1 and the first author of DeepSeek - Coder, joined ByteDance's Seed team and became one of the heads of the Agent team. In addition, Wei Haoran, the core author of the OCR series, and Ruan Chong, a core contributor to multi - modality achievements, also left one after another.

DeepSeek's core R & D team has only about a hundred people. Losing 5 core members is equivalent to having breakpoints in all 4 technical lines at the same time, exposing the structural problems in the talent mechanism.

DeepSeek's organizational model is extremely flat, with only two levels: Liang Wenfeng and researchers. There is no clock - in, no assessment, and no clear KPIs or deadlines. More than 70% of the team members are under 30 years old, and more than 70% have bachelor's or master's degrees.

An algorithm engineer close to DeepSeek revealed that as long as someone in the team has an inspiration or an idea, they can form an internal team to explore, which is difficult to achieve in large companies with complex hierarchies.

This model is suitable for frontier exploration, but it still has a fatal shortcoming: the lack of a mature equity incentive system.

As early as 2023, Liang Wenfeng tried to contact investors with an agreement similar to the "return cap" between OpenAI and Microsoft, but no institution accepted it. And due to the lack of financing experience and share pricing of DeepSeek, the stock options given to employees are difficult to become a definite incentive tool.

In this situation, when ByteDance offers a salary package consisting of cash, ByteDance stock options and Doubao stock options, when Xiaomi waves with an annual salary of tens of millions, when Alibaba even offers the position of post - training head, and when some competitors poach with 2 - 3 times the income - it's hard for DeepSeek, which is full of post - 95s researchers, not to face a talent dilemma.

Moreover, Zhipu and MiniMax have gone public one after another, and the rising stock prices have brought considerable wealth effects. In this environment, an un - priced and non - tradable stock option agreement is becoming less and less persuasive.

In this regard, financing is not only for reserving computing power, but also for pricing the employees' stock options. With a valuation, there is certainty, and only then can it compete with the talent - poaching machines of large companies.

Liang Wenfeng obviously realizes this. DeepSeek is promoting the company's valuation work, clarifying the stock option pricing, and trying to give the team more certainty.

Besides Money, There Are Also Scenarios

"Giving a valuation" and "buying computing power" are not the only goals of financing. The more crucial thing may be the development direction of Agent.

In March 2026, DeepSeek released 17 new recruitment positions at once.

The most eye - catching are three exclusive positions in the Agent direction: Agent deep - learning algorithm researcher, Agent data evaluation researcher, and Agent infrastructure engineer.

The algorithm researcher needs to explore the application of reinforcement learning in large - model alignment and ability improvement, covering directions such as RLHF, process rewards, and preference learning.

The data evaluation researcher needs to build an evaluation data set and design test cases for the core capabilities of Agent, such as planning, tool - calling, multi - round interaction, and long - term memory.

The infrastructure engineer needs to build the underlying base for Agent operation, and is required to be familiar with Agent interaction protocols such as MCP, Tool Use, and Function Calling.

In addition, even the model strategy product manager position has a separate Agent direction, requiring familiarity with the core mechanisms of Agent, such as Tool Use, Planning, long - term memory, and Multi - Agent collaboration.

Comparing with January this year, at that time, the core positions opened by DeepSeek were still concentrated in general research directions such as "Deep - learning researcher - AGI".

In just two months, DeepSeek's recruitment focus has obviously shifted from basic model research to Agent productization.

In the past, there was a popular conjecture in the industry: when the model's ability is strong enough, the Agent's ability will naturally be strong, and the continuously evolving model can eventually swallow up the value of the Agent.

But as we emphasized in the article "The Bitter Awakening of Agent: Intelligence is Moving from Language to Experience": the complementary effect between AI and Agent is two - way.

The richness of Agent in complex scenarios will in turn enhance the model's ability. The model needs to be tempered, make mistakes, and accumulate feedback in real task chains to acquire those abilities that cannot be trained in the laboratory.

For example, the continuity of memory beyond the context, the reliability of multi - tool scheduling, and the ability of autonomous planning when facing ambiguous instructions.

The growth of these abilities depends on rich, complex and real application scenarios and training environments.

On March 26th, Lin Junyang, who left Alibaba three weeks ago, put forward a more insightful conclusion in his long technical article "From 'Reasoning' Thinking to 'Agentic' Thinking":

"In the SFT (Supervised Fine - Tuning) era, we were obsessed with data diversity; in the Agent era, we should be obsessed with the quality of the environment: stability, authenticity, coverage, difficulty, state diversity, feedback richness, anti - exploitation ability, and the scalability of rollout (completely executing a process)."

But where can a good environment come from?

One way is to build it by oneself, but this requires time and resources.

Another way is to obtain it through the industrial network of strategic investors. An investor with a B - end customer ecosystem can directly provide Agent training scenarios in vertical fields such as finance, office, and development. The value of this "scenario synergy effect" far exceeds the funds themselves.

From this perspective, the significance of financing is not "blood - transfusion", but "access", especially accessing industrial resources that cannot be purchased with Magic Square's profits alone.

The Unchanged Geeky Nature

There is another voice in the market: DeepSeek has been caught up in the past two years, and the effects of R1 and V3 have been gradually approached by domestic large companies, so it needs financing to catch up with other competitors.

This judgment only sees the surface but ignores the core.

DeepSeek is not positioned as a traditional commercial company, but as a frontier artificial - intelligence laboratory.

Its value does not need to be reflected by the product experience of the client or the control of hallucinations, nor does it need to be proved by the benchmark score or benchmark test of a certain model. Its core influence lies in: how many reusable methodologies it has output to the industry.

On New Year's Day 2026, DeepSeek published a new paper on arXiv, proposing the mHC (Manifold - Constrained Hyper - Connection) architecture, which solves the instability problem of the hyper - connection architecture in large - scale training by constraining the connection weight matrix on the manifold of doubly stochastic matrices. The first authors of the paper include Xie Zhenda and Wei Yixuan, and Liang Wenfeng himself is also on the list of 19 authors.

This paper touches the oldest and most basic part of the Transformer architecture - the residual connection proposed by He Kaiming in 2015. In the past decade, there has hardly been any fundamental change in the residual connection.

Looking further back: In September 2025, DeepSeek - V3.2 introduced the DSA (Sparse Attention) mechanism. Cambricon announced the adaptation to this framework just four minutes after its release, and this architecture was later borrowed by Zhipu GLM - 5.

In October 2025, DeepSeek released its first - generation OCR model, achieving efficient document recognition through context - based optical compression; in January 2026, it launched DeepSeek - OCR - 2, introducing the visual causal flow. With 3B parameters, it surpasses models with tens of billions of parameters and has watermark robustness. After the series was open - sourced, it has been widely used in the industry, and its technology has been directly cited in academic research by institutions such as the University of Pittsburgh and Princeton University, reshaping the technical route of OCR and multi - modality visual understanding.

Of course, the most well - known is the GRPO (Group Relative Policy Optimization) algorithm proposed by Guo Daya during his time at DeepSeek, which was later directly applied to the training of R1 and became a key methodological innovation recognized by the industry.

In the field of open - source models, DeepSeek outputs not only model weights, but also a technical roadmap that is repeatedly cited and followed by the industry.

For a team that can continuously produce methodologies followed by the industry, 1 - 2 financing moves do not mean a fundamental change in the AGI direction.

The Mirror across the Ocean

Regarding the question of "does financing equal compromise", there is a ready - made mirror across the ocean, which is Anthropic.

In 2024, Anthropic's annualized income was still around $1 billion. By the end of 2025, this figure had risen to $9 billion. And by April 2026, the annualized income had reached $30 billion.

It took about 15 months to go from $1 billion to $30 billion, and Anthropic achieved all this without relying on the user scale on the consumer side -

Its consumer user volume is only about 5% of ChatGPT's, and it relies more on enterprise API contracts, developer subscriptions, and Claude Code. The annualized income of Claude Code, which was launched less than a year ago, had exceeded $2.5 billion in February 2026.

What's more noteworthy is the efficiency difference.

OpenAI's annual training cost is expected to be $125 billion by 2030, while Anthropic's forecast for the same period is about $30 billion. In the same race, there is a 4 - fold cost difference. OpenAI is expected to lose $14 billion in 2026, while Anthropic is expected to achieve positive free cash flow in 2027.

Anthropic has proved one thing: there is not a zero - sum relationship between commercialization and model - ability improvement.

Self - financing through B - end APIs not only does not prevent it from becoming a top - notch model provider, but also supports its continuous R & D investment without being restricted by others.

DeepSeek doesn't necessarily have to follow the path of Anthropic, but it at least shows a possibility: limited commercialization can provide more fuel for frontier research and exploration.

Frontier AI R & D is essentially a highly uncertain exploration activity. It is completely different from making apps, doing e - commerce, or doing information flow. The latter have clear user indicators,