DeepSeek Appears on the Cover of "Nature". Led by LIANG Wenfeng, It Responds to Controversies for the First Time
Abstract: DeepSeek also responded to the "distillation" controversy for the first time.
On September 17, 2025, another high - light moment for Chinese artificial intelligence arrived. The DeepSeek - AI team led by Liang Wenfeng and his colleagues published research results on the open - source model DeepSeek - R1 in the journal Nature and made it onto the cover of that issue.
Image | From the Internet
The paper points out that the reasoning ability of large language models (LLMs) can be significantly improved through pure reinforcement learning, thus reducing the dependence on manual annotation. Compared with traditional training methods, the model trained by this method shows better performance in mathematical problem - solving, programming competitions, and problems at the postgraduate level in STEM fields.
Here, DeepSeek also responded to the "distillation" controversy for the first time. In communication with reviewers, DeepSeek clearly stated that R1 does not learn by copying the reasoning examples generated by OpenAI models. Just like most other large language models, the base model of R1 is trained on the Internet, so it will absorb the existing AI - generated content on the Internet.
"The Low - cost Miracle": From $290,000 to the World Stage
In the AI world, there is a harsh consensus: the threshold for top - tier large models has never been algorithms, but cost. It is estimated externally that OpenAI spent over $100 million on training GPT - 4. Google, Anthropic, and Meta are also competing with budgets in the tens of millions of dollars. Funds and computing power have become the core factors determining the right to speak.
However, DeepSeek broke this "unspoken rule". According to the details disclosed by the research team in the supplementary materials of the paper, the reasoning cost of DeepSeek - R1 is only $294,000, which is astonishingly low. Even with the addition of about $6 million in base model training costs, the overall cost is still far lower than that of foreign giants.
The real breakthrough of DeepSeek - R1 lies not only in cost but also in methodological innovation.
The research team pointed out in the paper published in Nature that they adopted a pure reinforcement learning (RL) framework and introduced the Group Relative Policy Optimization (GRPO) algorithm, rewarding the model only based on the correctness of the final answer, rather than making the model imitate human reasoning paths.
Surprisingly, this seemingly "extensive" training method allows the model to naturally exhibit advanced behaviors such as self - reflection, self - verification, and the generation of long chains of thought in practice. Sometimes, it even generates hundreds or thousands of tokens to deliberate on a problem.
This is particularly evident in mathematical tests. The paper data shows that in the American Invitational Mathematics Examination (AIME 2024), the accuracy rate of DeepSeek - R1 - Zero jumped from 15.6% to 77.9%, and reached 86.7% after using self - consistency decoding, exceeding the average human level.
Nature commented that this indicates that the model can independently form complex thinking patterns through reinforcement learning without human reasoning demonstrations.
In the subsequent multi - stage optimization (including RL, rejection sampling, supervised fine - tuning, and secondary RL), the final version of DeepSeek - R1 not only performed outstandingly in hardcore tasks such as mathematics and programming but also showed fluency and consistency in general tasks such as writing and Q&A. This means that DeepSeek is not "teaching AI to think" but "letting AI learn to think on its own".
Liang Wenfeng's Decade - long Journey
Behind the success of DeepSeek - R1, apart from technological breakthroughs, there is also a little - known story of struggle. Liang Wenfeng was born in 1985 into an ordinary family in Zhanjiang, Guangdong. His father is a primary school teacher. Although his growth path is not well - known to the public, it shows early curiosity and perseverance in details.
In 2002, 17 - year - old Liang Wenfeng was admitted to the major of Electronic Information Engineering at Zhejiang University. Five years later, he continued to pursue a master's degree in Information and Communication Engineering under the supervision of Xiang Zhiyu, focusing on machine vision research. It was during his master's degree that he and his classmates tried to apply machine learning to the financial market to explore fully automated quantitative trading - that year, the global financial crisis was sweeping the world. Although there were many opportunities, such as Wang Tao, the founder of DJI, inviting him to start a business together, Liang Wenfeng chose a less - traveled path: convinced that artificial intelligence would change the world, he decided to start an independent business.
After graduating from his master's degree, Liang Wenfeng first combined artificial intelligence technology with quantitative trading, founded Jacobi Investment and Magic Square Technology, and developed steadily over more than a decade. Until 2023, he turned his attention to general artificial intelligence, founded DeepSeek, and embarked on the road of AI large - model research and development. With a dual focus on algorithms and cost - efficiency, DeepSeek successively released V2 and V3 models in just two years, not only reducing the reasoning cost of domestic large models but also shocking the global market with amazing cost - performance.
Liang Wenfeng's concept of team building is also extraordinary. He adheres to the principle of "ability first". Most of the core positions are composed of fresh graduates and young people with only one or two years of experience. "We may not be able to find the top 50 talents in China, but we can train them ourselves." This belief is also the key to DeepSeek's ability to achieve high reasoning ability at low cost.
Looking now, the value of DeepSeek's research goes far beyond a powerful model. It is more like a "methodological manifesto", showing the world a more sustainable path for AI evolution that does not rely on massive amounts of labeled data. It breaks the spell of "funds as a barrier" and returns the initiative of AI development to scientific innovation itself.
This is not only a high - light moment for Chinese AI but also an important milestone for the global AI to move towards a "reasoning revolution". Lewis Tunstall, a machine - learning engineer at Hugging Face and a reviewer of Nature, believes that "R1 has started a revolution". More and more people are applying R1's methodology to improve existing large language models.
In the future AI competition, it is very likely to shift from an "arms race of data and computing power" to an "innovation competition of algorithms and wisdom". And DeepSeek - R1 has sounded the horn for this new competition.
This article is from the WeChat public account "Phoenix Tech", author: Jiang Fan. Republished by 36Kr with permission.