HomeArticle

Don't overlook the double - sided game of Silicon Valley AI elites.

赛格大道2026-05-25 09:19
Openly repair the plank road while secretly making a detour through Chencang.

There is a systematic divergence between the public statements of Silicon Valley AI elites and their real judgments and resource deployments. There is a strategic tacit understanding among them to suppress China's AI: on the surface, the threat theory is used to expand policy ammunition (export controls, cracking down on distillation, and restricting cloud leasing), while inside, generational gap weapons are used to lock in a real generational lead.

Facing the smoke bombs released by Silicon Valley, almost a year and a half has passed since the explosion of DeepSeek at the beginning of last year. China's public opinion field has basically been successfully captured by its surface narrative, and no one wants to break out of the cognitive comfort zone.

01

A large model that keeps the finance ministers of Japan and Canada awake at night

Senior officials of the financial systems in Japan, Canada, and the UK have recently issued warnings because they have witnessed the amazing capabilities of a new US model called Mythos. It is currently the real flagship model of Anthropic, with a training parameter scale of about 10 trillion and a single - training cost of up to about $10 billion.

Sergey Brin, the co - founder of Google and the person who pushed Gemini back to the peak, sighed after experiencing the Mythos model: "This is a large model at the AGI level." This is the first time that a figure of the entrepreneurial godfather level in Silicon Valley has admitted that AGI has been achieved in a specific model.

But in China, ordinary users have never heard of it. Even AI practitioners, even if they have heard the name, are not clear about its actual strength.

Mythos is not publicly available. This means that the model is not publicly provided through APIs, does not enter the LMArena, and does not compete with any model on public lists. Instead, it is accessible in a controlled manner through a mechanism called "Project Glasswing".

The glasswing is a butterfly native to South America. The seemingly delicate and transparent wings of this butterfly can withstand a pressure 40 times its body weight. The artificial intelligence security protection mechanism envisioned by Anthropic is not a "Great Wall of Steel" piled up with materials, but a long - term mechanism with great resilience.

The Greta oto from the Americas is called the "glasswing butterfly" because of its transparent wings.

Twelve founding partners have access rights, including key infrastructure companies such as AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, and the Linux Foundation, and about 40 other organizations maintaining "critical software infrastructure" have limited access rights (the UK AI Safety Institute has tested Mythos). However, the national - level financial regulatory and national security agencies of allies such as Canada, Japan, and the EU are basically not on the list, which means they can neither independently evaluate the capability boundaries of Mythos nor verify its potential impact on their countries' critical systems.

This is exactly why the finance ministers of Japan and Canada had to collectively discuss defense plans at the IMF meeting. Since Mythos appeared in April, Mozilla Firefox has fixed more security vulnerabilities in a single month than in the whole of 2025, 20 times higher than the monthly average. If someone uses it for an attack, it can systematically scan the vulnerabilities of another country's critical infrastructure within a few hours.

Interestingly, the reaction of financial officials is not "I also want this model" but "I want to defend against it", which is the first time in AI history. This is certainly related to the rift between the US and its traditional allies, but the more real reason is that its capabilities are so powerful that it scares generally conservative financial officials. After all, Mythos almost reaches the logic of a nuclear weapon, similar to a non - proliferable strategic asset.

So, this brings us back to the real starting point of the problem: When a country manages its "truly strongest AI" and its "AI for external evaluation" separately, all "gap analyses" based on public lists lose their meaning.

In China's public opinion field, the report repeatedly cited in the past month is the "AI Index Report 2026" released by Stanford HAI in April. In the report, the performance "gap between Chinese and US large AI models has narrowed to 2.7%". Many Chinese practitioners, investors, policymakers, and even ordinary people are full of confidence because of this.

Titles such as "A historic turning point, there is no gap between Chinese and US large models" and "A shocking reversal, China locks the overall situation with only a 2.7% gap" have appeared in the public opinion field in the "2026 AI Index Report".

02

Tear open the truth of the ranking list

If Mythos is the "unreleased" model in the US, an evaluation by the CAISI Center under the US National Institute of Standards and Technology (NIST) at the end of April tells us that the gap in the part that the US has released is much larger than expected.

CAISI used an unpublished benchmark to test DeepSeek V4 Pro:

CAISI is affiliated with a US official agency (the Department of Commerce) and cannot be bought for inflated scores.

From the table, the gap between the latest models released by both sides is not 2.7%, but more than 30 percentage points, and it reaches 39 points in the most sensitive dimension of network security.

CAISI's conclusion is that the actual capabilities of DeepSeek V4 Pro are equivalent to those of GPT - 5 eight months ago, and the gap is widening. In fact, in the technical blog of DeepSeek V4, it is compared with GPT5.4, not with the almost - simultaneously - released 5.5.

Why doesn't CAISI's conclusion match the Stanford report? Because the methods are different.

The fundamental problem with public benchmarks (such as MMLU, HumanEval, HLE, etc.) is that the longer they are publicly available, the more targeted optimizations each company makes. Once the question types are analyzed thoroughly, the model teams can use data enhancement and reinforcement learning to specifically improve scores. After a period of time, the scores on public lists do not reflect real capabilities, but the "test - taking abilities" of each company. The gap obtained by CAISI using unpublished benchmarks is the real gap.

DeepSeek claims that its capabilities are close to Claude Opus 4.6. This is true on public benchmarks, but on CAISI's unpublished benchmarks, it is actually equivalent to GPT - 5 in the Opus 4.4 era. There is a gap of two version numbers, eight months, and the iterative rhythm of three of the strongest companies.

The evaluation results show that GPT 5.5 is the strongest in all scores, followed by Claude Opus 4.6, and finally DeepSeek V4.

Of course, this evaluation does not include Mythos. What is the real gap including Mythos? There is no public answer, but it will almost certainly widen further.

03

The key: The exponential computing power gap between China and the US

Where does the gap come from? The answer is computing power.

The data is from public sources, compiled by the author.

A rather cruel fact is that the AI capital expenditure of Meta alone in 2026 is already close to the total of all top Chinese AI companies. The total number of parameters of DeepSeek V4 Pro is 1.6 trillion, but it is more than six times less than the 10 - trillion - level parameters in the US.

The "simultaneous training of seven models" violent mode of xAI can only be achieved with extremely abundant computing power. In essence, it uses computing power to buy certainty and uses the successful one. However, even so, it has not won. Recently, it was just integrated into SpaceX by Elon Musk and lost its independent operation qualification.

Compared with Silicon Valley companies doing "violent screening" in algorithms, Chinese top - model companies represented by DeepSeek have to be "frugal" in algorithms. Some people may say that this reflects the engineering capabilities of Chinese companies, but a more realistic situation is that this is a last resort. After all, everyone understands that frugality can catch up 80% of the capabilities, but the last 20% of the generational gap must be supported by sufficient computing power.

It is a fact that the production capacity of domestic chips is catching up rapidly. But even if the production capacity increases, can it achieve the same results as NVIDIA? This is another challenge that is seriously underestimated.

A study published in November 2025 (by institutions such as the University of Auckland, the Hong Kong Polytechnic University, Lingnan University, and Harbin Institute of Technology) conducted a large - scale actual test on five enterprise - level AI accelerators for the first time - NVIDIA H200, AMD MI300X, Intel Max 1100, Huawei Ascend 910B, and Apple Mac M4 Pro - using more than 100,000 variants synthesized from 4,000 real PyTorch models for testing one by one.

The conclusion is shocking:

Operator support: NVIDIA H200 supports 488, while Huawei Ascend 910B only supports 407, 17.5% less - and what Huawei lacks is exactly the part most relied on by large models: quantized inference, sparse operations, flash_attention, NLP embedding, fused training operations, and advanced linear algebra.

Output consistency rate: When running the same data with the same model, the output of AMD and NVIDIA is 99.8% consistent; Huawei's is 95% - there is a 5% probability of giving different results; Mac's is 86%.

Platform defects: NVIDIA has 1, AMD has 4, and Huawei has 13 - the frequency of heap memory crashes is 10 times that of other platforms, and the number of unsupported operators is 80 times more than that of NVIDIA.

A 5% output inconsistency rate is unacceptable in high - reliability scenarios such as finance, healthcare, and autonomous driving; in the pre - training scenario of large models, this deviation will be amplified into systematic errors through millions of iterations.

The above is from an academic paper, and there is a real - world footnote. Coincidentally, just a few days after the Google I/O conference, David Holz, the founder of the AI video - generation company Midjourney in San Francisco, publicly complained that using Google's TPU chips had made the model lag behind by a year and expressed his admiration for NVIDIA chips.

Midjourney was once the absolute leader in AI image - generation, but now it seems rather mediocre and has been overtaken by many products. This case intuitively shows the current irreplaceability of NVIDIA's high - end chips for most companies. Of course, such a public complaint will inevitably face external pressure, so this content disappeared soon. However, the praise for NVIDIA chips represents a practical understanding and is objectively more convincing than the paper.

This means that the computing power gap between different chips is not just a "quantity" problem, but also a "quality" problem. For China's AI to be truly independent, it not only needs to increase the production capacity of Ascend from 6,000 - 7,000 units per month to match that of NVIDIA, but also needs to make up for the 17.5% shortage in PyTorch operator support, increase the output consistency rate from 95% to nearly 100%, and reduce the number of platform defects from 13 to nearly 1 - this is a much more difficult project than simply expanding production.

China's AI circle has repeatedly told a story in the past year: DeepSeek trained a top - level model with 2,048 H800s, proving that algorithms can offset computing power. This story is true - but its boundaries are not clearly stated.

A study published at the main conference of ACL 2026 (a cooperation between the University of Science and Technology of China and the Shanghai AI Laboratory) gives a clear mathematical conclusion:

The scaling law of pre - training is a power - law relationship and can theoretically be extended infinitely.

Post - training in reinforcement learning (RL) is a log - linear relationship and has an absolute ceiling.

RL essentially activates the output distribution of the existing capabilities in pre - training, not creating new capabilities.

In plain language, there is no way to adjust a 1.6 - trillion - parameter model to the capability level of a 10 - trillion - parameter model through post - training.

DeepSeek's "algorithms offsetting computing power" offsets the waste of computing power at the same parameter scale, not the capability gap between different parameter scales. To catch up with the US in cutting - edge capabilities, it is necessary to have the pre - training ability of 10 - trillion - level parameters - and this physically requires a cluster of hundreds of thousands of high - end GPUs, continuous training for several months, and an investment of tens of billions of dollars. Without a computing power foundation, the cutting - edge gap is a mathematical constraint, not a problem of engineering efforts.

04

Suppression and confusion: The double - act game of Silicon Valley

If we comprehensively consider the closure of Mythos, the unpublished evaluation of CAISI, the exponential computing power gap, the inconsistencies of heterogeneous accelerators, and the mathematical ceiling of post - training, and then look back at the repeated releases in China's public opinion field by Silicon Valley in the past six months that "China's AI is catching up" and "the gap has narrowed to 2.7%", we will find one thing: There is a systematic divergence between the public statements of Silicon Valley AI elites and their real judgments and resource deployments.

On the surface, the China threat theory is used to