HomeArticle

Anthropic's frantic hype over distillation is essentially a battle for narrative dominance

周天财经2026-06-28 11:14
the competition for the right to define, the competition for the right to interpret, and the competition for legitimacy

Text | Zhou Tian Finance

Original production by Zhou Tian Finance

The narrative battle around distillation has flared up again.

On June 25th, foreign media reported that the American AI company Anthropic has accused Alibaba's Qwen model of "distilling" its Claude model. According to Reuters and The Wall Street Journal, Anthropic has written to US senators and White House officials, alleging that Alibaba used 25,000 fake accounts to interact with Claude about 28.8 million times between April and June, attempting to distill Claude.

As of now, Anthropic has not presented any substantial evidence. By simply issuing a letter to condemn another company and stir up a hype wave, this behavior has drawn criticism from the vast majority of developers in the global AI community.

This is not Anthropic's first move. As early as February this year, the company published a blog post claiming that three Chinese companies, DeepSeek, Kimi, and MiniMax, had distilled Claude, which is almost exactly the same as today's accusation against Alibaba.

DeepSeek and the other two companies have not made any statements, and Alibaba has also not responded. However, it is noteworthy that just two days before this news (on June 23rd), Alibaba officially sued the US Department of Defense, demanding to be removed from the "Chinese Military Companies List".

I tried to delve into the details and think it can be divided into two parts.

First, let's look at distillation itself. To put it simply, distillation cannot be equated with plagiarism and theft. This most common technology in the industry has been completely stigmatized by Anthropic.

Distillation itself is a recognized legal training technique in the AI industry. It was a technical method proposed by Geoffrey Hinton, one of the three giants in AI and a Nobel laureate, in 2015 and has since become a common practice in the industry.

Distillation is a training method of "learning from the best": it can help a model quickly master a certain answering style, task paradigm, and basic capabilities. It can also "compress" some of the capabilities demonstrated by large models onto small models with relatively high efficiency. Therefore, the value of distillation mainly lies in speeding up, reducing costs, and transferring capabilities. It can save a model from taking many detours and help it quickly approach the level achieved by a powerful model.

It is a well - known fact in the industry that American AI companies often distill from each other. Whether it's OpenAI, Anthropic, or Qwen/DeepSeek, the pipelines for model training are quite similar.

In particular, industry insiders often point out that Anthropic distills the achievements of other companies everywhere.

Interestingly, some time ago, when its Claude Opus 4.8 was released, a developer used its official API for testing. When asking "What model are you?" in Chinese, the model's return field was claude - opus - 4 - 8, but the output answer said: "I am Tongyi Qianwen (Qwen)". This sparked a lot of discussions in the AI community, and many speculated that Claude had distilled Chinese models. Subsequently, more developers found that Claude 4.8 would output answers like "I am DeepSeek".

Even Kai - Fu Lee said in an interview in March 2026: "Maybe you heard recently that the American company Anthropic complained that some Chinese companies had distilled its model. Distillation itself doesn't violate any rules. Isn't it a bit of an overreaction... And Anthropic still owes me $3000 in manuscript fees."

The background behind Kai - Fu Lee's words is that Anthropic was collectively sued by authors for downloading about 482,000 copyrighted books from the pirated websites LibGen and PiLiMi to train its models. Eventually, Anthropic paid $1.5 billion to reach a settlement, which is also the largest single - copyright settlement in US history.

This kind of copyright infringement is much more serious than distillation and was a serious judgment by the judge. However, Anthropic's accusation against Alibaba lacks substantial evidence, and its attempt to shift the blame is more blatant than ever.

Distillation is not a one - size - fits - all solution. AI expert Nathan Lambert said that distillation is just imitation, and real capabilities come from exploration through reinforcement learning, not just copying outputs. Moreover, Charles O'Neill, the head of model training at Baseten, said that knowledge distillation alone cannot build a top - notch artificial intelligence system; it also requires several other complex underlying technologies.

To put it in the simplest way, distillation is like an athlete watching the game videos of world champions, imitating their movements, and even training with champions to quickly understand how top - level athletes exert force, choose the rhythm, and handle key plays. This is of course very helpful and may even lead to significant progress in a short period.

However, what really determines whether he can become a world champion is still his own physical fitness, technical details, tactical awareness, psychological stability, daily training intensity, and the complete coaching team, training, and rehabilitation system behind him.

In other words, watching champion videos can help you avoid detours, and training with champions can help you improve your speed. But whether you can win the championship ultimately depends on more than just "imitation".

What really pushes a model to the top level usually includes a more underlying and complex process of capability building, such as the knowledge base laid by large - scale pre - training, the learning materials ensured by high - quality data cleaning, the stability and efficiency determined by the training formula, the exploration and self - correction capabilities brought by reinforcement learning, the feedback loop provided by the evaluation system, and the final implementation effect achieved by engineering optimization, inference acceleration, and deployment capabilities. In other words, distillation is more like "learning experience"; it can help the model learn faster, but it may not be able to determine how far the model can go on its own.

Looking back, why does Anthropic keep targeting Chinese AI companies? The answer is obvious. Chinese AI is not only catching up rapidly in performance rankings but also becoming more and more popular globally. The large - model call list of OpenRouter, the world's largest third - party API platform, is often dominated by Chinese large models such as Qianwen, GLM, Kimi, and DeepSeek. How can Anthropic, which is temporarily in the lead, not be anxious?

After discussing distillation, let's talk about the elephant in the room behind today's industrial competition: geopolitical pressure.

In the field of basic large models, Chinese companies have shown very strong catching - up capabilities in the past two years. Despite being at a disadvantage in computing power and facing increasing external restrictions, they can still quickly push the model capabilities to the global forefront through higher engineering efficiency, faster iteration rhythms, more flexible open - source strategies, and more practical application orientations.

This catching - up alone is enough to make some overseas manufacturers feel uneasy.

At the same time, companies like Anthropic are currently caught in the middle of complex security reviews and government relations. They must constantly respond to the security demands of the government and the military. Therefore, actively strengthening the "China threat" narrative can not only help them occupy a more favorable position in the policy context but also may be a way for them to submit "credibility" to the Washington security system and the Pentagon.

Considering that Anthropic is widely used by the US military on the battlefield and is deeply involved in government subsidies and procurement as a kind of "munitions", it can be said that Anthropic is a new military - industrial complex in the contemporary era, not a fragile white lotus in an ivory tower.

I recently visited the United States. After in - depth communication and observation, I had a lot of feelings.

During my stay in the US, I noticed that there are many believers in shows that play up the threat theory of Chinese companies. For example, a friend in San Francisco showed me how some American million - follower bloggers questioned the data transmission of Unitree robots (of course, a Pakistani - American blogger with millions of followers I met highly praised Unitree).

Troodon, a leading 3D printing company, has faced accusations of being anti - open - source. A technical arrangement originally designed to enhance the stability of cloud services has been interpreted as a narrative of a giant being anti - open - source in the context of the European and American open - source communities and has even been hyped up as a geopolitical topic.

Without exception, these Chinese benchmark technology companies have all developed cutting - edge products but have been maliciously attacked for some insignificant technical details and accused of threatening security.

When the narrative power is not in their own hands, even if they have achieved the top position in a niche market globally, there will still be constant disputes. This is a problem that Chinese enterprises will have to face for a long time.

There was also a small incident. When I entered San Francisco, I was taken to a small room for a review. The officer asked me what I thought was the best question of this trip: "You say you're a tech blogger. Then why haven't you been to the US for many years? In my opinion, you should come every year."

What he meant was that I couldn't write about technology without staying in the US. This shows a Euro - American - centric perspective, where they "can't see" many trends clearly. I told him that there are already a large number of technology and startup companies in China, which are enough to keep me busy for a while.

It is puzzling how these two contradictory things, magnifying the threat on one hand and belittling the strength and ignoring the achievements on the other, can be combined in the same accusation. The degree of self - contradiction has become a common double - standard phenomenon.

Looking back at the recent distillation controversy, we also need to clearly recognize that today's large - model competition is no longer just a competition of performance in the laboratory or a competition of products in the market. It is also a competition for the right to define, the right to explain, and the right to legitimacy.

It is foreseeable that the distillation controversy will not be an isolated incident. Narrative battles around distillation and even more technical details will emerge one after another for a long time. The narrative pressure faced by technology companies like Alibaba, Troodon, and Unitree will be experienced by more and more cutting - edge Chinese enterprises. It's a long road ahead, and this generation needs to face it together.

*If any listed companies are mentioned in this article, it is only for research and communication purposes and does not constitute a recommendation for stocks or related financial products.

This article is from the WeChat official account “Zhou Tian Finance” (ID: techfinsight). Author: Zhou Tian Finance. Republished by 36Kr with authorization.