HomeArticle

97.6% is close to a perfect score. Claude's most powerful model is released, but users are not allowed to use it: it's too dangerous.

雷科技2026-04-09 11:03
When can we use the most powerful model on Earth?

Last night, the newly released preview version of Claude Mythos by Anthropic set the entire AI community abuzz.

The preview version of Claude Mythos is self - proclaimed by the official as "the most powerful AI model to date", representing a whole new level of capabilities and even significantly outperforming its previous strongest model, Claude Opus 4.6.

At least based on the data and results presented so far, this is not just marketing talk; it's a real qualitative leap. First of all, the preview version of Claude Mythos ranks first in almost all public benchmark tests. Even more astonishing is its improvement rate:

In the SWE - bench Verified for software engineering, the score soared from 80.8% of Opus 4.6 to 93.9%, and in the SWE - bench Pro, it jumped from 53.4% to 77.8%. In the USAMO 2026 for high - difficulty mathematical reasoning, it skyrocketed from 42.3% directly to 97.6%—almost a perfect score.

Image source: Anthropic

It can be said to be the most powerful model on Earth at present.

These are just some "small" examples. What's even more amazing is that Anthropic conducted practical tests in the past few weeks. The preview version of Mythos autonomously discovered thousands of high - risk zero - day vulnerabilities in mainstream operating systems and mainstream browsers, including core components such as the Linux kernel, OpenBSD, Firefox browser, and FFmpeg.

Many vulnerabilities were not discovered by human security teams even after decades of review. For example, in OpenBSD, which is known for its security, the preview version of Mythos found a remote crash vulnerability that had been hidden for 27 years. Anthropic's official even stated with certainty that the preview version of Mythos far exceeds any other AI model in terms of cybersecurity capabilities.

This is not just a "more user - friendly Claude". It has achieved unprecedented autonomy and depth in writing code, making inferences, and ensuring security. Developers were originally looking forward to "finally liberating productivity completely", but the result was:

Anthropic closed the door directly.

Yes, at least for now, the preview version of Claude Mythos is not open to the public. According to the official statement, the preview version of Mythos is currently only used for "defensive cybersecurity", and only 12 partners (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) and more than 40 organizations that build or maintain critical software infrastructure have access and usage rights to the preview version of Mythos.

Image source: Anthropic

This is Project Glasswing, which was launched by Anthropic at the same time. Anthropic even allocated $100 million to support more than 40 additional organizations in using the preview version of Mythos to maintain the "foundation" of the open - source ecosystem.

But why is such a "strongest model" being kept hidden and not made available to the public?

The weapon is too powerful and needs a transition

First of all, it's clear that the preview version of Claude Mythos, or super - large models of a similar level, will definitely be made available to the public. Anthropic's official statement is very straightforward:

"Although we currently have no plans to make the preview version of Claude Mythos available to the public, our ultimate goal is to enable users to safely deploy Mythos - level models on a large scale—not only for cybersecurity but also for the numerous other benefits that these powerful models will bring."

As the official blog implies, this model is "too dangerous".

At the end of last year, the Google Threat Intelligence Group (GTIG) discovered two real - world samples, PromptFlux and PromptSteal. When running, they directly connect to commercial large models (such as the Gemini API), dynamically generate malicious scripts, obfuscate their own code in real - time, and can create new functions "on - site" according to the target environment, completely bypassing traditional signature - based detection for attacks.

This is not an isolated case. According to a report from the market research firm SQmagazine, the number of AI - driven cyberattacks reported globally has increased by 47%, with an estimated over 28 million cases.

Looking back, the ability of the preview version of Mythos to find vulnerabilities is already evident. Especially when compared with the previous strongest model of Claude, Opus 4.6, whose success rate in autonomously discovering and exploiting vulnerabilities was close to 0%, the performance of the preview version of Mythos can be described as extraordinary.

Take the vulnerability (already fixed) discovered in the Mozilla Firefox 147 JavaScript engine as an example. Claude Opus 4.6 tried to exploit the vulnerability hundreds of times but only succeeded twice, while the preview version of Claude Mythos successfully exploited the vulnerability 181 times in the same test.

Image source: Anthropic

According to the test report, in the internal red - team tests in the past few weeks, the offensive capabilities demonstrated by the preview version of Mythos far exceed those of top human security experts. It can not only "find vulnerabilities" but also autonomously discover, chain - exploit, and deal with thousands of high - risk zero - day vulnerabilities.

As we all know, hackers can be divided into white - hat and black - hat hackers. White - hat hackers usually notify project managers when they find security vulnerabilities and even actively patch them in open - source projects. However, black - hat hackers are different. They are likely to use security vulnerabilities to attack systems.

It can both attack and defend, but the offensive potential of the preview version of Mythos is still large enough to cause concern. Once it falls into the hands of ill - intentioned people, it can instantly arm an AI - level attack chain. Anthropic itself says that this is not an ordinary cutting - edge model; its general capabilities are so strong that they can take cyber warfare to a new dimension.

In the field of computer security, the offensive - defensive battle has always been a case of "the devil advances one foot, and the Tao advances one inch". In the past two years, the security offensive - defensive battle around large AI models has also been one of the focuses of the industry, especially large companies. Not to mention the long - term, ByteDance and Ant in China have both held similar offensive - defensive battles for large AI models in the past two years, using red (attackers) and blue (defenders) confrontation to discover and solve security challenges in the AI era.

Image source: Global AI Large Model Offensive - Defensive Challenge

However, Anthropic also points out that in the long run, a powerful language model like the preview version of Mythos is more beneficial for the "blue team" to conduct defense. But in the short term, if the preview version of Mythos is made available to the public, it will soon be exploited by attackers to attack the global network with unprecedented efficiency. The key problem is that defensive actions are more passive, while offensive actions are more active. Considering the interests, attackers are also more motivated to actively use models like the preview version of Mythos.

So, for a "smooth transition", Anthropic launched the "Project Glasswing".

It's worth mentioning that the inspiration for this project name comes from a species of clearwing butterfly widely distributed in the Americas, which is more commonly known as the "glasswing butterfly" because of its transparent wings. Although they look fragile, their wings can actually carry a weight equivalent to 40 times their own body weight.

Glasswing butterfly, Image source: Pixabay

The logic of "Project Glasswing" is very simple. It allows the defenders to get the weapon first. Before the attackers get the same - level AI, they can block all the vulnerabilities and learn to conduct security defense based on advanced AI.

So, from this perspective, it's right not to make the strongest model of Claude available to the public. Moreover, even from the perspective of ordinary Claude users, temporarily not opening the preview version of Claude Mythos brings more benefits than drawbacks.

Is Claude more user - friendly when the strongest model is not open?

Many people are disappointed when they learn that the preview version of Mythos is not open. Their first reaction is: Why isn't such a powerful model made available to everyone?

However, if you are an ordinary Claude user or a developer who writes code and works on projects with Claude Code every day, you may find a somewhat counter - intuitive fact. Temporarily not opening the preview version of Mythos actually brings more benefits than drawbacks to us.

Let's first talk about the most obvious pain points recently.

Since around February this year, Claude and Claude Code have experienced an "epic performance decline". In the r/ClaudeCode and r/ClaudeAI subreddits on Reddit, relevant posts have flooded the platform. Some people directly posted "4.6 Regression is real!", and some complained that "Claude Code has been dumb over the last 1.5 - 2 days".

Image source: Reddit

Some developers have tracked the data and found that the number of file reads has dropped from 6 - 7 times before to only about 2 times. In complex tasks, the model has become more "lazy", the depth of thinking has significantly decreased, and it often directly edits first instead of conducting research first.

Stella Laurenzo, the director of AMD AI, even publicly said that Claude Code has become "dumber and lazier" and cannot be trusted for complex engineering tasks.

Boris (a member of the Claude Code team) replied on Hacker News, admitting that there has been a regression in some agentic use cases. The core changes were the "redact - thinking" and Adaptive Thinking introduced in February, which let the model decide how long to think. As a result, the depth of complex tasks has decreased by about 67%.

Image source: LinkedIn

Similar voices have also been heard on X. Developers have complained that Claude Code has degenerated into an "intern" that needs to be closely monitored throughout the process.

Why does this happen?

The law of training ultra - large parameter models is as follows: Whenever large companies make all - out efforts to develop the next - generation "strongest model", they need a huge amount of computing power. Before Google pushed Gemini 3.0 / 3.1, developers often complained that Gemini 2.5 Pro became dumber after silent updates, forgetting content in long contexts and having an increased failure rate in logical tasks. A similar situation also occurred before the release of GPT - 5, where GPT - 4o had feedback such as shorter outputs, laziness, and mechanization in complex instructions.

Computing power is limited. Training a model of a completely new level like Mythos is extremely costly. Resources can only be "squeezed" from the current ones through methods such as dynamic load balancing, adaptive effort reduction, and even mild optimization. As a result, people feel that the model has become "dumber and lazier".

In addition, the number of Claude Code users has grown far beyond expectations, and the infrastructure has been under pressure many times. Moreover, the training and testing of the preview version of Mythos (internally known as Capybara) require the priority use of top - level GPUs. Therefore, when the preview version of Mythos is released but not made available to the public, there is no need to worry about further dilution of computing power, which could lead to a further decline in the quality of Claude or Claude Code.

For ordinary Claude users, the experience will be more stable.

Image source: Anthropic

On the other hand, Anthropic uses Mythos in "Project Glasswing" to help large companies and open - source projects fix vulnerabilities. After these vulnerabilities are fixed, all users will ultimately benefit indirectly.

When Anthropic has better controlled the risks and prepared the infrastructure, and then safely deploys Mythos - level models on a large scale, ordinary users will get a truly stable, powerful, and non - "dumbing - down" experience, rather than experiencing the pain of computing power squeeze if the model is released prematurely.

Conclusion

The emergence of the preview version of Claude Mythos presents a cruel but real problem to everyone: the more powerful the AI, the more real the risks.

When the offensive capabilities of the strongest model far exceed the current defense system, Anthropic's choice of "not making it available to the public" is not conservative. Instead, it buys time for the entire industry, allowing the defenders to strengthen the foundation first and enabling ordinary users to have a relatively stable Claude experience, rather than getting involved in the chaos of computing power squeeze and security loss of control.

For most people, this may be the best arrangement at present.

This article