Claude Mythos: So Powerful That I'm Afraid to Let You Use Me

Only 12 institutions can use it.

On April 8th, according to Zhidongxi, Anthropic released its new-generation model, Claude Mythos Preview, and the accompanying security project, Project Glasswing, today. The most powerful ability of this model is that it can find software vulnerabilities that neither human experts nor automated tools have discovered. OpenBSD is one of the most notoriously difficult-to-breach operating systems. It found a vulnerability that had been hidden for 27 years in it. A certain line of code in FFmpeg had been triggered 5 million times by automated testing tools, but the problem was never identified. However, this model successfully discovered the vulnerability in it.

However, since the relevant protection mechanism is not yet mature, this model is not currently open to the public. It is only accessible within a small-scale cooperation system composed of 12 institutions. Anthropic also promised to provide up to $100 million (approximately RMB 687 million) in model usage credits for defensive cybersecurity research.

A tweet on the social media platform X by Anthropic officially announcing Project Glasswing

In the professional vulnerability reproduction test, CyberGym, it scored 83.1%, while Anthropic's previously most powerful public model, Opus 4.6, scored 66.6%. In terms of programming ability, in the SWE-bench Verified test that measures software engineering tasks, it scored 93.9%, while Opus 4.6 scored 80.8%. Anthropic said that the capabilities of the new model have reached the level of "being able to compete with the top human security experts".

Anthropic also released the results of a special test on vulnerability exploitation in the Firefox JS shell environment. The data shows that in this scenario, Mythos Preview successfully generated a complete exploitable exploit (vulnerability exploitation code) in 72.4% of the cases, and register control was achieved in another 11.6% of the tests. In contrast, the success rate of the previous model, Opus 4.6, in the same task was less than 1%. This means that the vulnerability exploitation ability of Mythos Preview has increased by nearly 80 times compared to Opus 4.6.

A comparison test of the vulnerability exploitation capabilities of three Claude models in the Firefox JS shell environment (Source: Anthropic)

Meanwhile, Anthropic also announced supporting arrangements, including providing $4 million (approximately RMB 27.472 million) in funding to the open-source community, disclosing phased research results within 90 days, and promoting industry collaboration on issues such as vulnerability disclosure and supply chain security. Overall, this project not only focuses on model capabilities but also extends to governance mechanisms and industry standards.

This official release had an unseemly backstory. At the end of March this year, a configuration error in Anthropic's content management system led to nearly 3,000 unpublished internal assets being accidentally exposed in a publicly searchable data storage. The leaked content shows that Anthropic internally named this model Claude Mythos and characterized it as "the most powerful AI model to date." At the same time, the document directly warned that it "poses unprecedented cybersecurity risks."

About a week before the official release of the Glasswing plan, Anthropic accidentally leaked nearly 2,000 source code files and more than 500,000 lines of code due to a packaging error in the 2.1.88 version of the Claude Code software package. Subsequently, when trying to clean up, it mistakenly issued a take-down notice for about 8,100 GitHub code repositories, which was only resolved after an emergency withdrawal.

System card: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf

01. Unearthed a 27-year-old vulnerability and discovered a vulnerability missed in 5 million tests

Anthropic disclosed on its official website that its newly trained cutting-edge model, Claude Mythos Preview, has discovered thousands of zero-day vulnerabilities in all major operating systems and all major browsers, several of which have been classified as high-risk.

The company said that the vulnerability discovery ability of this model has exceeded "all humans except the top security experts," and the above work was completed autonomously by the model without any human guidance.

The official website provided three specific cases of fixed vulnerabilities.

First, the model found a 27-year-old vulnerability in OpenBSD, which is known for its security and is often used to run critical infrastructure such as firewalls. An attacker only needs to establish a connection to remotely crash any machine running this system.

Second, in FFmpeg, which is widely used by a large amount of software for video encoding and decoding, it found a 16-year-old vulnerability. Previously, automated testing tools had hit this line of code 5 million times but never identified the problem.

Third, in the Linux kernel that runs most of the world's servers, the model independently discovered and chained multiple vulnerabilities to achieve privilege escalation from ordinary user permissions to full control of the target machine.

All three vulnerabilities have been reported to the relevant software maintainers and repaired. The details of the other discovered vulnerabilities have been submitted in the form of encrypted hashes and will be gradually made public after the repairs are completed.

In the CyberGym vulnerability reproduction benchmark test, Mythos Preview scored 83.1%, while Anthropic's previously most powerful public model, Opus 4.6, scored 66.6%. The company said that as AI capabilities advance at the current pace, such offensive capabilities will inevitably spread to a wider range of actors, including those who are not willing to deploy them responsibly. At that time, the potential impact on the economy, public safety, and national security will be severe.

A comparison of the scores of Claude Mythos Preview and Claude Opus 4.6 in the CyberGym cybersecurity vulnerability reproduction benchmark test (Source: Anthropic)

A comparison of the scores of Claude Mythos Preview and Claude Opus 4.6 in multiple code ability benchmark tests (Source: Anthropic)

A comparison of the scores of Claude Mythos Preview and Claude Opus 4.6 in multiple general reasoning ability benchmark tests (Source: Anthropic)

A comparison of the scores of Claude Mythos Preview and Claude Opus 4.6 in autonomous search and computer operation benchmark tests (Source: Anthropic)

02. Launched Glasswing in collaboration with multiple institutions and provided up to $100 million in support for security research

Project Glasswing was initiated by Anthropic, and 12 institutions, including Amazon Web Services (AWS), Apple, Broadcom, Cisco, cybersecurity company CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and cybersecurity company Palo Alto Networks, joined as founding partners.

The logos of the partner companies of Project Glasswing (Source: Anthropic)

Anthropic promised to provide up to $100 million (approximately RMB 687 million) in usage credits for the Mythos Preview model during the research preview period to cover the defensive security work of the above partners. In addition to the 12 founding partners, more than 40 organizations that build or maintain critical software infrastructure have currently obtained extended access rights to scan and strengthen their respective first-party systems and the open-source systems they rely on.

In addition to financial support, Anthropic also made a direct donation of $4 million (approximately RMB 27.472 million) to the open-source ecosystem: $2.5 million (approximately RMB 17.17 million) was donated to Alpha-Omega and the Open Source Security Foundation (OpenSSF) under the Linux Foundation, and $1.5 million (approximately RMB 10.302 million) was donated to the Apache Software Foundation to help open-source software maintainers cope with the changes in the cybersecurity threat landscape in the AI era.

Open-source maintainers who are interested in applying for access rights can submit a separate application through the Claude for Open Source project.

After the research preview period ends, Mythos Preview will provide commercial access to participating institutions at a price of $25 (approximately RMB 171.7) per million tokens of input and $125 (approximately RMB 858.5) per million tokens of output. The access channels include the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

In terms of usage scenarios, the official website listed the key work of the partners as local vulnerability detection, black-box testing of binary files, endpoint security strengthening, and system penetration testing. The underlying systems involved cover a significant portion of the global shared network attack surface.

The partners have successively commented on the test results of Mythos Preview: Cisco, AWS, Microsoft, CrowdStrike, Palo Alto Networks, etc. have all publicly confirmed that this model has discovered complex vulnerabilities that were missed in previous versions in their internal security work. Google will provide model access to project participants through the Vertex AI platform.

03. The model is not currently available to the public, mainly due to the incomplete protection mechanism

Anthropic does not plan to make Claude Mythos Preview available to the public. The official reason is that to achieve the safe large-scale deployment of a Mythos-level model, it is necessary to develop cybersecurity protection measures that can detect and block the most dangerous outputs of the model, and this mechanism is not yet ready.

As a transitional arrangement, Anthropic plans to first deploy and test the above protection mechanism on the upcoming Claude Opus model.

The logic is that the Opus model does not have the same level of risk as Mythos Preview and can serve as a relatively low-risk carrier for improving and perfecting the protection measures. After the mechanism matures, it will be promoted to Mythos-level models.

For security professionals whose compliance work is affected by the new protection measures, Anthropic said that it will open a special application channel called the "Cyber Verification Program," but the specific details have not been announced.

Anthropic's official blog said that as AI capabilities continue to advance, such offensive capabilities will "soon" inevitably spread to a wider range of actors, including those who do not commit to responsible deployment. The potential consequences involve the economy, public safety, and national security.

Meanwhile, Anthropic said that it has been having continuous discussions with US government officials about the offensive and defensive network capabilities of Mythos Preview and that the US and its allies must maintain a "decisive lead" in AI technology. The government plays an indispensable role in assessing and mitigating AI-related national security risks.

04. Promised to make research results public within 90 days and promote the establishment of a cross-industry cybersecurity standard system

Anthropic promised to publish a public report within 90 days, covering the main findings during the research phase, the status of fixed vulnerabilities, and the system improvement results that can be disclosed. The project partners will also share information and best practices with each other to the extent of their capabilities.

The official website described the overall duration of the project as "several months" and pointed out that the cutting-edge AI capabilities themselves "may advance significantly in the next few months." Therefore, cyber defenders need to take immediate action rather than wait.

At the industry standard level, Anthropic listed the specific issues it plans to promote in cooperation with leading security organizations, including the vulnerability disclosure process, software update process, open-source and supply chain security, software development lifecycle and security design practices, security standards for regulated industries, the scaling and automation of vulnerability classification and handling, and patch automation. The official website did not disclose the specific implementation schedule for the above issues or the list of confirmed partners.

At the institutional construction level, Anthropic put forward a medium-term vision: to establish an independent third-party institution that brings together private and public sector organizations as a long-term carrier for the continuous advancement of large-scale cybersecurity projects. The company also publicly invited other members of the AI industry to join and participate in the formulation of industry standards.

Anthropic characterized Project Glasswing as "a starting point" and said that no single institution can independently solve these cybersecurity problems, including cutting-edge AI developers, software companies, security researchers, open-source maintainers, and governments around the world.

05. Conclusion: Focus on the security bottom line rather than the upper limit of capabilities

Judging from the information disclosed in the Glasswing project, Anthropic did not focus on further amplifying the model's capabilities but instead shifted more attention to how these capabilities are constrained and used. The vulnerability discovery and exploitation capabilities demonstrated by Claude Mythos Preview have exceeded the scope of traditional tools.

The approach taken by Glasswing is to conduct verification through small-scale cooperation and concentrated resource investment before the capabilities are fully matched with the protection mechanism. This method does not change the model's capabilities themselves but changes the pace of their spread. The accompanying financial support, information disclosure, and standard discussions are also trying to transform the technical issues of a single company into a security issue for cross-institutional collaboration.

In the longer term, the significance of this project lies not in how many vulnerabilities are discovered in the short term but in whether a replicable operation and governance framework can be formed. As the model's capabilities continue to improve, whether a mechanism similar to Glasswing will become the industry norm will directly affect the actual implementation path of high-capability AI systems.

This article is from the WeChat official account

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Claude Mythos: I'm so powerful that I'm afraid to let you use me.

01. Unearthed a 27-year-old vulnerability and discovered a vulnerability missed in 5 million tests

02. Launched Glasswing in collaboration with multiple institutions and provided up to $100 million in support for security research

03. The model is not currently available to the public, mainly due to the incomplete protection mechanism

04. Promised to make research results public within 90 days and promote the establishment of a cross-industry cybersecurity standard system

05. Conclusion: Focus on the security bottom line rather than the upper limit of capabilities