HomeArticle

Just now, Claude Mythos has shattered the ceiling of AI evaluations, soaring exponentially, and accelerating towards the 2027 singularity.

新智元2026-05-11 14:58
Claude Mythos breaks through the upper limit of the METR evaluation, and the super-exponential growth of AI approaches the AGI singularity.

[Introduction] Just now, Claude Mythos rendered the evaluation "invalid": METR failed to measure accurately for the first time, and the inflection point of AI offense and defense has arrived! The evolution of AI has become an "alien civilization" descending, surpassing exponential growth, and the AGI singularity in 2027 is accelerating towards humanity.

Just now, Claude Mythos exceeded the upper limit of the METR evaluation! Super-exponential evolution is approaching the AGI singularity.

Today, a trend chart has gone viral across the internet.

The world's most authoritative AI evaluation institution, METR, was terrified to find that their "thermometer" was about to be burst by Mythos.

The capabilities of Claude Mythos Preview have broken through the ceiling of the human evaluation framework and entered the "distortion zone"!

Leopold Aschenbrenner, a former member of OpenAI's Super Alignment Team, once predicted that 2027 would be the singularity of AGI. However, the latest data now shows that Mythos' performance is slightly higher than the trend line of the 2027 scenario.

The "alien civilization" has forcefully landed, and its shadow has covered the entire sky.

A major earthquake in the evaluation field: when "full marks" no longer make sense

In METR's latest test, they attempted to measure the AI's ability to complete long-term complex tasks (Time Horizons).

METR set an indicator called the "50% success rate timeline" - that is, the model has a 50% probability of successfully and independently completing a task that takes humans X hours to complete.

Previously, the performance of previous models was in the range of tens of minutes or a few hours.

However, when Claude Mythos took the test, the data simply went off the charts: It easily achieved a 50% success rate in an extremely complex long-term task that takes humans 16 hours to complete!

You might ask: What about testing tasks that take 32 hours or 64 hours?

The answer from METR is terrifying: "We can't measure them anymore."

Among the 228 devilish test tasks carefully constructed by METR, only a mere 5 tasks are classified as "16 hours or more". What does this mean?

This means that the existing and proud difficulty problem library of humanity has been completely emptied by AI.

It's like using a tape measure with a scale of only 1 meter to measure a skyscraper. Apart from knowing that it "goes off the scale", we know nothing about its true depth.

Has the "alien civilization" arrived?

In the range above 16 hours, METR simply doesn't have enough samples to conduct an accurate quantitative comparison of Mythos.

METR admits that above this threshold, the data measurement becomes "unstable and meaningless".

This is an extremely rare scene in human history: the creator has lost the tool to measure the capabilities of the created.

When the "examiner" can no longer come up with questions, how terrifying is the real strength of the "examinee"?

This is not just a regular iteration of an AI model, but a "super-exponential" species mutation. The old rules are collapsing, and AI has become an "alien civilization" descending!

Chase Brower, an AI practitioner and a well-known Silicon Valley observer, said bluntly that the development of AI far exceeds the industry's expectations: According to SemiAnalysis data, the annualized revenue of the AI industry has far exceeded the previous prediction of about $26 billion for the second quarter of 2026.

The current AI technology is hovering in the sky of human civilization like a "clearly visible alien spaceship".

Humans can no longer understand the super-exponential growth of AI!

This is no longer just data from a laboratory. It indicates that the signs of AGI have fully emerged!

Super-exponential: faster than exponential growth

Let's take a closer look at METR's trend chart.

The vertical axis represents the duration of coding tasks that the AI can complete autonomously, ranging from 8 seconds to 5 years, on a logarithmic scale. The horizontal axis represents the model release time, from 2021 to 2028. Each point represents a model version.

Connecting the points, the curve drawn is not a straight line, not an exponential curve, but a steeper arc than an exponential curve.

AI is growing super-exponentially, and the growth rate of AI itself is accelerating.

  • In 2021, the best model could autonomously complete tasks on the order of 8 seconds - writing a line of code or fixing a spelling error.
  • At the beginning of 2023, it reached the 1-minute level - a small function or a simple debugging session.
  • In mid-2024, it soared to about 1 hour - the implementation of a complete feature or a multi-file refactoring.
  • In April 2026, the performance of Mythos Preview: 16 hours - a complete engineering sub-project, including reading code, understanding the architecture, formulating a plan, writing the implementation, and debugging and testing, all in one go without human supervision.

The leap in each generation is greater than the previous one, and the interval between generations is shorter.

Human evolution has equipped us to calculate the distance between fruits and prey on the grassland, and our brains are naturally linear.

We've just managed to understand "exponential growth", but now we're forced to face an exponential above the exponential.

The ape-like brains of humans simply crash when faced with super-exponential growth.

METR drew several reference lines on the chart.

Based on the joint predictions of multiple institutions, assuming that AI capabilities continue to grow according to the most mainstream current expectations, the threshold of general artificial intelligence will be reached around 2027.

Mythos' data point falls above this line.

It's not just a slight deviation. Before the time axis reaches 2027, the capability value has already exceeded the predicted value for 2027.

After reading the METR report, Chase Brower, an AI infrastructure practitioner, tweeted that the description of "Agent - 1" expected to appear in early 2026 actually underestimates the capabilities of the current best models. The entire industry's estimation of the AI development speed is too conservative.

There's a detail that's easy to overlook.

The vertical axis of METR's chart is not a score, not an accuracy rate, and not a percentage on a certain benchmark. There's currently no sign of the curve slowing down.

The atomic moment in the security circle: from "assistant" to "autonomous attacker"

If METR's concerns are still academic, then Palo Alto Networks' warning is a bloody real - world report.

Recently, Palo Alto gained early unrestricted access to cutting - edge models such as Mythos and GPT - 5.5 - Cyber.

The test results sent shivers down the spines of all defenders: AI has crossed the threshold of "autonomy".

What can an AI model do in the security field when it can work autonomously for 16 hours?

Time collapse: 3 weeks = 1 year

There's a shocking piece of data in Palo Alto's report: Using Mythos to assist in vulnerability analysis, in just 3 weeks, the depth and breadth of work completed are equivalent to the workload of an entire top - level penetration testing team for a whole year.

Link: https://www.paloaltonetworks.com/blog/2026/05/frontier-ai-defense/

This is a direct dimensionality reduction strike.

This image is generated by AI

Previously, AI could only help you write a script or search for a code snippet. However, Mythos has demonstrated an almost terrifying "intuition for software vulnerabilities".

It can identify scattered and low - risk small vulnerabilities in tens of thousands of lines of code.

What's even more terrifying is that it can, like a top - level hacker, connect these originally insignificant vulnerabilities into a deadly attack chain.

With AI assistance, the entire process from initial intrusion to data exfiltration has been compressed to 25 minutes.

In the past, an attack of this level might have taken a team weeks of lurking to complete.

This image is generated by AI

How can we save ourselves before the singularity hits us?

Anthropic once refused to fully release Claude Myth