HomeArticle

DeepSeek's Employment Philosophy: Academic Achievers, Young People, No Horse Racing | Focus Analysis

周鑫雨2025-01-09 09:30
Innovative things should be entrusted to newcomers.

Written by Zhou Xinyu

Edited by Su Jianxun

Luo Fuli, the "Genius Post-95 Girl" personally recruited by Lei Jun to Xiaomi and a former DeepSeek model trainer, has revealed a tip of the iceberg of the DeepSeek talent portrait: young and outstanding fresh graduates.

It is this group of "mysterious wizards" (as evaluated by Jack Clark, the former policy director of OpenAI) who, with only 6 million US dollars, trained the model DeepSeek-V3 with performance surpassing GPT-4o and Claude 3.5 Sonnet.

Liang Wenfeng, the founder of DeepSeek, gave a rough portrait of this group of employees in an interview with 36Kr: "They are all fresh graduates from top universities, doctoral students in the fourth or fifth year of their studies who haven't graduated yet, and some young people who have graduated for only a few years."

However, simply building a team of geniuses is not enough to realize DeepSeek's AGI ideal.

Through interviews with multiple relevant individuals, "Intelligent Emergence" found that for DeepSeek to make good use of this group of young geniuses, the management style of the team is crucial.

Currently, as the team size of many AI companies expands rapidly, they have to adopt a more efficient vertical management model.

However, since its establishment in May 2023, DeepSeek has kept the team size at around 150 people and adopted a culture that downplays job ranks and is extremely flat to determine research topics and mobilize resources.

And innovation occurs in this group of unproven young geniuses and a company that adopts a non-internet organizational form.

Hundred Young Geniuses, No Horse Racing, No Team Leading

Hiring experienced veterans with AI technology experience is the recruitment strategy of most AI companies.

For example, Wang Xiaochuan brought the old team of Sogou 20 years ago for Baichuan Intelligence; Jiang Daxin, who came from Microsoft, also recruited his old colleagues from Microsoft Asia Research Institute when he founded StepStar. And the co-founder list of ZeroOneEverything was initially star-studded, including:

Huang Wenhao from Microsoft Asia Research Institute, Pan Xin, the first research software engineer of Google Brain and the former head of ByteDance's AI platform, and Li Xiangang, the former head of the Strategy and Algorithm Center of Beike Group.

But DeepSeek prefers young people without work experience.

A headhunter who once cooperated with DeepSeek told "Intelligent Emergence" that DeepSeek doesn't want senior technical personnel. "Three to five years of work experience is the maximum, and those with more than eight years of work experience are basically rejected."

For example, three of the core authors of DeepSeekMath, Zhu Qihao, Shao Zhihong, and Peiyi Wang, completed the relevant research work during their doctoral internships. Another example is Dai Daimai, a V3 research member who just obtained a doctoral degree from Peking University in 2024.

Dai Daimai. Source: Internet

Without work experience, in addition to the universities, the competition results are also the criteria for DeepSeek to measure whether young graduates are "excellent". Several third-party cooperative institutions of DeepSeek also said that DeepSeek attaches great importance to competition results. "Basically, those below the gold medal are not wanted."

A DeepSeek member once disclosed his resume on the Internet: graduated from Peking University, and won gold medals in three ACM/ICPC (International Collegiate Programming Contest) competitions. During his undergraduate years, he published a total of 6 papers, two of which were co-first-authored, and most of them were top conferences.

According to "Intelligent Emergence", in 2022, Magic Square Quantitative started to build an AI team for DeepSeek. In May 2023, when DeepSeek was officially established, the team already had nearly 100 engineers.

Nowadays, excluding the infrastructure team in Hangzhou, the engineers in the Beijing team also have a scale of 100 people. The acknowledgments list of the technical report shows that there are already 139 engineers participating in the DeepSeek V3 research.

The team of 100 people is dwarfed by the model armies of ByteDance, Baidu and other companies with thousands of people in terms of talent scale. But in the AI innovation field where the "talent density" weighs far more than the "personnel scale", many people described to "Intelligent Emergence" that DeepSeek is a team of all-elite members.

How to manage and retain this group of young geniuses? On the one hand, it is to brutally throw money and give cards.

Insiders told "Intelligent Emergence" that the salary level of DeepSeek is benchmarked against ByteDance's R & D. "Based on the ByteDance offer that talents can get, the price is increased."

At the same time, as long as Liang Wenfeng judges that the technical proposal has potential, the computing power given to talents by DeepSeek is "unlimited".

On the other hand, DeepSeek adopts a rather flat and "academic" management style.

The above-mentioned headhunter said that each member of DeepSeek does not lead a team, but is divided into different research groups according to specific goals. There is no fixed division of labor and superior-subordinate relationship among the members within the group. "Everyone is responsible for the part they are best at solving. When encountering difficulties, they will discuss together or consult experts from other groups."

Liang Wenfeng once described this organizational form as "bottom-up" and "natural division of labor" in an interview with 36Kr: "Everyone has their own unique growth experience and comes with their own ideas, and there is no need to push them... When an idea shows potential, we will also allocate resources from top to bottom."

In the industry, many entrepreneurs also regard "flatness" as an organizational model suitable for innovative businesses. "Equal communication is very important for building a learning organization, and diluting job identities will encourage everyone to speak freely." Wang Huiwen once said to "Intelligent Emergence" when he founded the AI company Lightyear Beyond.

Greg Brockman, the co-founder of OpenAI, also mentioned that there is no distinction between researchers and engineers in OpenAI's job positions, and they are all collectively referred to as "Member of Technical Staff". This means that "junior engineers" in the mainstream sense can also play a leading role in research projects.

A typical result of "natural division of labor" is the key training architecture, MLA, that significantly reduces the training cost of V3. Liang Wenfeng mentioned that MLA originally came from the personal interest of a young researcher. "We formed a team for this and it took several months to run through."

At the same time, there is no horse racing within DeepSeek - according to an AI practitioner who has been in contact with the DeepSeek team, this is to prevent the waste of manpower and resources caused by horse racing. "It is also not conducive to the retention of talents and the formation of team consensus. The internal friction caused by the horse racing mechanism is too serious."

"To Innovate, the Team Must Break Away from Inertia"

In 2023, several labels for the top AI talent portrait in China - academic experts, senior executives of large companies, and veteran entrepreneurs - all point to the same employment standard: these talents need to be verified by workplace standards such as job ranks and product influence.

But obviously, since 2024, the employment standards in the AI industry are changing. More young people who have not yet been verified in the workplace and have just graduated are coming to the fore.

Aditya Ramesh, one of the heads of Sora, once said at the 2024 Zhiyuan Conference that OpenAI's recruitment strategy is very different from other organizations. "We pay more attention to those with high potential, but who may not have had the opportunity to obtain formal academic achievements."

Similarly, Xie Saining, the author of DiT (the underlying architecture of Sora), also mentioned that many very successful researchers have not really experienced the so-called traditional research and formal research training.

The conversation between Xie Saining and Aditya Ramesh at the Zhiyuan Conference. Source: Zhiyuan

A similar recruitment concept is also reflected in DeepSeek's recruitment strategy. Many young people who join DeepSeek do not have relevant experience in model training, and some are not even from a computer science background.

A DeepSeek member who graduated from the physics major once publicly mentioned that he taught himself computer science by chance. "Because the work is too cutting-edge, there are almost no reference materials. All problems are solved by designing solutions and practicing them by myself." Another DeepSeek operation and maintenance engineer mentioned that before joining the company, he was a "novice" without any relevant experience.

"To innovate, the team must break away from inertia." An AI practitioner told "Intelligent Emergence" that nowadays, most domestic AI companies have fallen into the inertia of simply imitating OpenAI. They choose Transformer for the algorithm and follow the Scaling Law for training. "Following the verified path can reduce the risk of failure."

But people often ignore that before being verified by GPT-3, Transformer and Scaling Law were also regarded as "crazy things".

"DeepSeek does not set hard KPIs for its members, nor does it have commercial pressure. Members do not have much experience in model training, which instead enables them not to copy OpenAI's'standard answers'."

The above-mentioned practitioner said that an employee of DeepSeek once told him, "Nowadays, few manufacturers will make efforts to adjust Transformer, but DeepSeek's reflection on the algorithm architecture started from the first day. Other manufacturers may not be unable to develop MLA (the architecture independently developed by DeepSeek), but they will not want to overthrow the original correct answers."

But he also admitted that DeepSeek's confidence still comes from sufficient computing power and money. "All resources are invested in one thing, model training. They have no other businesses and do not spend money on marketing, saving a lot of money."

"DeepSeek does not recruit famous bigwigs. They have little motivation for innovation." A headhunter who once cooperated with DeepSeek summarized to "Intelligent Emergence", "Those who have been successful before have already succeeded. They have the burden of not allowing failure. The innovative things should be left to the newcomers."

Welcome to communicate!

Welcome to follow!