How many "Guo Daya" are there in DeepSeek? After analyzing 27 papers, we've discovered a group of "versatile warriors".
In the past year, there has been a continuous stream of news about talent movements around DeepSeek. From the early departure of Luo Fuli to the successive job - hopping of Wang Bingxuan, the original author of the first - generation large - model, Ruan Chong, a key figure in multimodality, and Guo Daya, the core author of R1.
With the successive poaching of core authors, will DeepSeek's technological barriers loosen?
We decided to approach this problem from a different perspective.
Using Codex and Python, we sorted through 27 core papers and technical reports published by DeepSeek in the past two years and dissected the signed authors of each paper. For large - scale technical reports such as DeepSeek V2, V3, V3.2, and V4, which have separable roles, we only retained the Research & Engineering list; for the remaining papers, we used the original signed lists. Finally, we obtained a pool of 328 R & D authors.
Jiazi Guangnian found that DeepSeek's R & D team and internal structure have the following characteristics:
No departmental barriers. Among the 328 R & D authors, 168 have formed stable and repeated cooperative relationships, resulting in a total of 319 cooperative connections.
Efficient breakthrough by the "corps + teams". One large - scale base model corps works efficiently with six elite special teams in areas such as system efficiency, mathematics and reasoning, multimodality, caching and systems, vertical mathematics, and OCR vision.
A gathering of researchers with top - tier university backgrounds. Nearly 40% of DeepSeek's Top 25 R & D authors are from Peking University.
No restrictions on R & D. More than half of DeepSeek's R & D authors are involved in cross - domain work, and 79 of them span three or more directions. Researchers will assemble dynamically according to their interests and problems.
Papers focus more on underlying issues. How to make better use of computing power, how to reduce cache costs when processing long contexts, and how to train stably after the model becomes larger.
The co - authorship network of DeepSeek's core papers. Each node in the figure represents a research author, and the connection represents the co - authorship relationship. Charted by Jiazi Guangnian.
After analyzing DeepSeek's 27 papers, Jiazi Guangnian believes that DeepSeek's approach can be summarized as: Don't stack computing cards, don't compete for rankings; verify first, then integrate; focus on system efficiency and break through computing power limitations. It is worth noting that there is almost no work in these 27 papers aimed at scoring on benchmarks. All of them are solving specific engineering bottlenecks.
1. Where do those poached people actually rank?
DeepSeek's 27 papers mainly cover seven technical directions: base models, systems/efficiency, mathematics/proof, multimodality, code, OCR, and reasoning/reinforcement learning.
We examine two dimensions: the number of papers participated in and the breadth of technical directions covered. It should be noted that these two indicators are based on paper signature statistics and do not represent the size of contributions or organizational levels. We call R & D authors who cover three or more technical directions " polygonal warriors".
What is this number? 79 people.
Let's see where those names rumored to be highly sought - after rank in the network.
Ruan Chong is indeed in the Top 1, covering 18 papers and six directions. He is almost everywhere, from the MoE architecture to mathematical proofs to multimodality.
He graduated from Peking University for both his undergraduate and postgraduate studies. He was engaged in NLP R & D in his early years and joined DeepSeek in 2023. He participated in the work of DeepSeek - VL, V3, and R1, and is the corresponding author of VL2. In January this year, he joined Yuanrong Qixing and served as the chief scientist.
Guo Daya participated in 11 papers, covering four directions, and is tied for 12th among high - frequency R & D authors. Wang Bingxuan participated in 10 papers, covering five directions, and is tied for 17th.
They are indeed core members, and their departure is certainly a loss. But the key question is: How many more "Guo Dayas" and "Wang Bingxuans" does DeepSeek have?
There are 24 R & D authors who have participated in more than 10 papers. Even if three of them have left, there are still 21 people with a similar level of participation.
If we regard DeepSeek as a football team, although several core players have been poached, the talent density of this team is thicker than expected.
Top 25 high - frequency R & D authors. The statistical scope is the R & D author pool. The number of papers participated in and the number of directions do not represent the ranking of contributions. Charted by Jiazi Guangnian.
What is more worthy of attention is the "cross - domain" issue. Among the 328 R & D authors, 158 have only appeared in one direction. The remaining 170 people have crossed at least two directions. Among them, 79 have spanned three or more directions.
Take the most extreme example. Li Yukun participated in 14 papers, spanning all seven directions, from the first - generation DeepSeek LLM to the latest V4. His Google Scholar citation count exceeds 20,000. He is the "first employee" of DeepSeek. He joined after leaving the ByteDance search team in 2023 and is responsible for work related to pre - training data.
This confirms a fact often overlooked by the outside world. In the AI industry, talent has always been flowing in multiple directions, and DeepSeek is also poaching people from other places.
The distribution of the number of technical directions covered by DeepSeek's R & D authors. The number of covered directions is calculated based on seven technical directions. Charted by Jiazi Guangnian.
2. How do polygonal warriors emerge?
The outside world always discusses whether DeepSeek still has geniuses.
Every AI company has its stars. What makes DeepSeek different is that it allows a group of very young people to quickly form teams, explore, and obtain resources among multiple technical directions with fewer constraints and restrictions.
During his internship at DeepSeek, Xin Huajian led the development of the DeepSeek - Prover series of models focusing on mathematical proofs. He is also the first author of the DeepSeek - Prover - V1.5 paper. He once told Jiazi Guangnian that Prover was initially just an independent exploration project within the company, with the original intention of verifying whether more rigorous reasoning data could be constructed through a formal system.
Most large companies first set up departments, define KPIs, allocate budgets, and then launch projects. DeepSeek does the opposite: First, someone thinks a problem is worth solving, and then they find people and resources around this problem.
In the paper cooperation network, the traces left by this "team - building" method are very clear. By clustering according to the signature relationship, we can see four relatively concentrated groups: the large - scale base model corps, system efficiency, mathematics and reasoning, multimodality, and three smaller cooperation clusters. It should be noted that these "groups" do not correspond to DeepSeek's real departments, but only reflect who often cooperates with whom.
The distribution of the cooperation network of DeepSeek's R & D authors. The cooperation groups are identified based on stable co - authorship relationships. Charted by Jiazi Guangnian.
Interestingly, this structure is highly consistent with the organizational method described by Liang Wenfeng.
Liang Wenfeng said: "We generally do not pre - assign tasks, but assign tasks naturally. Everyone has their own unique growth experience and comes with their own ideas, so there is no need to push them. When an idea shows potential, we will also allocate resources from top - down."
LatePost reported that DeepSeek has a very flat organizational structure. The research team generally only has two levels: Liang Wenfeng and researchers. "Sometimes, a new direction starts because three or five people all think an idea is good, and then they work on it together." Liang Wenfeng is more like a mentor: organizing R & D, coordinating resources, and signing as the corresponding author on joint achievements.
This organizational method also has a very rare feature in the AI industry: No overtime work. Most members leave the company between 6 and 7 pm on weekdays. There is no punching - in system and no clear performance appraisal. Liang Wenfeng's logic is: "It is difficult for a person to work at a high - quality level for more than 6 to 8 hours a day. Foolish judgments made under overtime fatigue will actually waste precious computing resources, which is not worth it."
Jiazi Guangnian found through sorting that most of the authors of DeepSeek's papers are undergraduates, postgraduates, and doctoral students from universities such as Tsinghua University, Peking University, and the University of Science and Technology of China who graduated around 2023. Nearly 40% of the top 25 high - frequency R & D authors graduated from Peking University.
But this should not be understood as a simple "tactics of a large number of talents from prestigious schools". Jiazi Guangnian learned that the recruitment orientation of many AI labs is changing, and doctoral students are more favored than veterans from large companies.
The chairman of an AI company once told Jiazi Guangnian that since ChatGPT came out, he has started to squeeze out his lunch time to interview potential doctoral students. He will spend at least an hour on even the smallest projects, from basic formula derivation to engineering detail control, to screen out real innovators. He mentioned that most people only started to turn to GPT - related architecture research in 2023, which means they are on the same starting line. "Doctors who graduated after this time point have not been restricted by industry inertia and often bring unexpected breakthroughs."
Liang Wenfeng himself also said: "Those who developed DeepSeek V2 are 'fresh graduates from top universities, doctoral students in their fourth or fifth years who haven't graduated yet, and some young people who have only graduated for a few years'."
So, how stable is the DeepSeek team? By cross - comparing the paper signatures, among the 86 authors of the first - generation model paper (January 2024), 75 still appeared in the signature of V4 (April 2026). After two and a half years, nearly 90% of the first - generation team still remains.
In the Research & Engineering list of V4, among the 269 R & D engineering authors, 10 are marked as having left, accounting for about 3.7%. According to Z Finance, as of April this year, about 60 - 70 members of ByteDance's Seed have flowed to various model companies in the past year.
These numbers do not equal DeepSeek's real attrition rate, but they show that the core R & D network has not fallen apart because of the departure of a few stars.
3. 27 papers in two years, focusing on system efficiency
Judging only from the external influence, the reports of base models such as V3 and V4 are the most eye - catching.
But the distribution of paper themes gives a somewhat counter - intuitive result: Among the 27 papers, the largest number is not base model papers, but papers on systems/efficiency (7 papers), which exceed base model papers (5 papers) and mathematics papers (5 papers).
These seven papers are: DeepSeekMoE, ESFT, NSA, Insights into V3, mHC, Conditional Memory, and DualPath. None of them is for scoring on benchmarks. All of them are solving the same type of problem: How to do more with less computing power.
The timeline of DeepSeek's 27 papers in the past two years. The horizontal axis represents the number of unique authors of each paper or technical report, and the color represents the technical direction. Charted by Jiazi Guangnian.
By dissecting these papers one by one, we can see three types of underlying problems:
First, how to make better use of computing power. ESFT focuses on how to complete model fine - tuning more economically, while Insights into V3 reviews how to improve hardware utilization and stability in large - scale cluster training.
Second, reduce cache costs when processing long contexts. When the model needs to process longer texts or perform complex Agent tasks, attention calculation and KV Cache (the intermediate memory for the model to save historical contexts) will become very expensive. NSA, Conditional Memory, and DualPath are all trying to compress the cost of the model "remembering history".
Third, how to train stably after the model becomes larger. DeepSeekMoE explores activating fewer expert networks when the parameter scale becomes larger; mHC tries to enhance signal propagation in deep networks and reduce the instability during the training of ultra - large - scale models.
Liang Wenfeng once put forward a hypothesis: "Can we achieve all the current intelligence with a part of the existing computing power?" These seven system - related papers can be regarded as the DeepSeek team's continuous attempt to answer this question.
There is also a detail worth noting. The scale of the authors of the 27 papers shows a rhythm of "large and small combinations". The base model reports often involve the participation of 200 to 300 people, while the papers in the directions of systems, mathematics, and multimodality usually have only 6 to 20 authors.
The former is like a large - scale corps operation, and the latter is like a single - point breakthrough by a special team. First, use a small team to verify at a low cost, and then integrate it into the next - generation flagship after it is successful.
4. From R1 to V4, accumulating the trump card
If