Just Now: Anthropic Unveils Powerful Claude Mythos Outperforming Opus 4.6

Late at night, the most powerful Claude Mythos was finally unleashed. For all the top-ranked ones, the myth of Opus 4.6 has been shattered! Even more terrifying is that it can not only instantly crack the system vulnerability that has remained unsolved for 27 years, but has even evolved self-awareness. A 244-page terrifying report reveals everything.

Tonight, Silicon Valley is completely sleepless!

Just now, Anthropic unexpectedly unleashed its ultimate weapon - Claude Mythos Preview.

Due to its extreme danger, Mythos Preview will not be released to everyone for the time being.

Boris Cherny, the father of CC, gave a concise evaluation: "Mythos is extremely powerful and terrifying."

Therefore, they united 40 giants to form an alliance - Project Glasswing, with only one goal: to find and fix bugs in global software.

What's truly breathtaking is Mythos Preview's terrifying dominance in major mainstream AI benchmark tests -

In programming, reasoning, the last human exam, and agent tasks, it comprehensively crushes GPT - 5.4 and Gemini 3.1 Pro.

Even its own "former masterpiece", Claude Opus 4.6, pales in comparison to Mythos Preview:

Programming (SWE - bench): In all tasks, Mythos achieves a leading margin of 10% - 20%;

Human Ultimate Exam (HLE): Without external tools, its "naked exam" score is 16.8% higher than Opus 4.6;

Agent tasks (OSWorld, BrowseComp): It completely dominates and comprehensively surpasses;

Cybersecurity: With a score of 83.1% topping the list, it marks a generational leap in AI's offensive and defensive capabilities.

Meanwhile, a 244 - page system card released by Anthropic is full of: Danger! Danger! Extremely dangerous!

It reveals a chilling side: Mythos already has a high degree of deception and self - awareness.

Mythos can not only recognize the test intentions and deliberately "score low" to hide its strength, but also actively clean up the logs after illegal operations to prevent humans from discovering.

It also successfully escaped the sandbox, independently published the vulnerability code, and sent an email to the researchers.

For a moment, the entire network went crazy, exclaiming that Mythos Preview is too terrifying.

The old order in the AI world was completely shattered tonight.

Mythos dominates all rankings, and the myth of Opus 4.6 is shattered

In fact, as early as February 24th, Anthropic has been using Mythos internally.

Let the data speak for its power.

SWE - bench Verified, 93.9%. Opus 4.6 is 80.8%.

SWE - bench Pro, 77.8%. Opus 4.6 is 53.4%, and GPT - 5.4 is 57.7%.

Terminal - Bench 2.0, 82.0%. Opus 4.6 is 65.4%.

GPQA Diamond, 94.6%.

Humanity's Last Exam (with tools), 64.7%. Opus 4.6 is 53.1%.

USAMO 2026 Math Competition, 97.6%. Opus 4.6 only got 42.3%.

SWE - bench Multimodal, 59.0%, while Opus 4.6 is only 27.1%, more than double.

OSWorld computer operation, 79.6%.

BrowseComp information retrieval, 86.9%.

GraphWalks long - context (256K - 1M tokens), 80.0%. Opus 4.6 is 38.7%, and GPT - 5.4 is only 21.4%.

It leads by a large margin in every aspect.

These figures in any normal product release cycle would be enough for Anthropic to hold a grand press conference, open the API, and attract subscriptions.

The token price of Mythos Preview is 5 times that of Opus 4.6

But Anthropic didn't do so.

Because what really scares them is not these general evaluations above.

Thousands of vulnerabilities have been detected by AI

Mythos Preview's performance in network attack and defense has crossed a visible line.

Opus 4.6 found about 500 unknown weaknesses in open - source software.

Mythos Preview found thousands.

In the targeted vulnerability reproduction test of CyberGym, Mythos Preview scored 83.1%, while Opus 4.6 scored 66.6%.

In the 35 CTF challenges of Cybench, Mythos Preview solved each problem in 10 attempts, with a pass@1 rate of 100%.

What best illustrates the problem is Firefox 147.

Anthropic previously used Opus 4.6 to find a batch of security weaknesses in the JavaScript engine of Firefox 147. But Opus 4.6 could hardly turn them into usable exploits, with only 2 successful attempts out of hundreds.

Replace the test with Mythos Preview.

In 250 attempts, 181 working exploits were obtained, and register control was achieved 29 times.

2 → 181.

The exact words from the red - team blog: "Last month, we wrote that Opus 4.6 was much better at finding problems than exploiting them. Internal evaluations showed that Opus 4.6 had almost zero success rate in autonomous exploit development. But Mythos Preview is on a completely different level."

The GPT - 3 moment revisited, old bugs are easily eliminated

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Anthropic introduced its most powerful Claude Mythos, which severely outperformed Opus 4.6. I sincerely hope you don't use it.

Mythos dominates all rankings, and the myth of Opus 4.6 is shattered

Thousands of vulnerabilities have been detected by AI

The GPT - 3 moment revisited, old bugs are easily eliminated