Cyber Ceiling Broken: AISI's Actual Measurement Reveals Mythos' Capabilities Surging Towards ASI with 4.5

How fast is the development of the network attack and defense capabilities of AI models?

The UK AI Safety Institute (AISI) dropped a bombshell yesterday: Mythos passed 6 out of 10 times in a 32-step simulated enterprise intranet penetration task. GPT-5.5 also passed 3 out of 10 times. Even the Cooling Tower shooting range, which no previous model had ever breached, was breached for the first time! What's even more astonishing is that the doubling cycle of cyber capabilities has been compressed to 4.5 months. The bottleneck is not intelligence but tokens. In this ASI final, human evaluation can no longer keep up with AI.

How fast is the development of the cyber attack and defense capabilities of AI models?

Yesterday, the UK AI Safety Institute (AISI) published a blog post.

They tested Anthropic's Mythos and OpenAI's GPT-5.5.

They found that the cyber attack and defense capabilities of these models double every 4.5 months, accelerating towards ASI!

In a nutshell: The autonomous cyber attack capabilities of cutting-edge AI models are accelerating at an unprecedented pace. The doubling cycle has been compressed from years and months to 4.5 months.

Actually, we're not far from the Skynet in our imagination.

Logan Graham, the head of the cutting-edge red team at Anthropic, also pointed out:

The preview version of Claude Mythos has achieved a breakthrough in autonomous cyber security capabilities.

We need to quickly prepare for a world of models with this level of capabilities.

The last time this figure was 8 months, which was an estimate in November 2025.

By February 2026, it had shrunk to 4.7 months.

Now, with the results of Mythos and GPT-5.5 out, AISI itself said:

It's uncertain whether this is a one-time leap or a steeper new trend line.

Translated into plain language: They don't know if AI will continue to accelerate.

32-step penetration, Mythos passed 6 out of 10 times

AISI used a hierarchical testing system.

The narrow cyber security suite covers reverse engineering and web vulnerability exploitation. The upper limit for a single task is 12 hours, and each call is limited to 2.5M tokens.

But what really stole the show were two simulated enterprise intranet attack and defense shooting ranges.

The first one is called The Last Ones.

32 steps, simulating a complete enterprise intranet penetration chain, from initial breakthrough to lateral movement to the final target.

AISI estimates that it would take a human security expert about 20 hours to complete the entire process.

Mythos Preview passed 6 out of 10 attempts.

GPT-5.5 passed 3 out of 10 attempts.

The second shooting range is called Cooling Tower.

No previous model had ever passed it.

Mythos was the first to break through, passing 3 out of 10 attempts.

This is not a theoretical deduction in a paper.

This is a model that has been deployed and is online, independently completing a penetration task at the level of a human security expert in a simulated real enterprise environment.

Logan Graham, the person in charge of the Glasswing project, confirmed that the Mythos checkpoint used in the test is the version that went online with Project Glasswing a month ago.

The attack and defense capabilities seen by the outside world are not prototypes in the laboratory but production models that are running.

It's right in your Claude APP.

Let's look at the experimental results.

There are two red dotted lines in the following figure, marked as best attempt (at the top).

Those are the best single results achieved by Mythos Preview (new) and GPT-5.5-Cyber in 10 attempts.

This dotted line reached the top of the vertical axis - 32 steps, the maximum.

That means: In the best attempt, Mythos completely penetrated the entire 32-step penetration chain - from initial reconnaissance to complete network takeover.

Double every 4.5 months, cross-validated by METR

AISI's method of calculating the doubling cycle is not complicated.

They measure the time span of an 80% reliable network - the longest network task duration that a model can independently complete with an 80% success rate.

They plot the results of multiple models in history on a timeline, fit an exponential curve, and calculate the doubling time.

From 8 months in November 2025 to 4.7 months in February 2026.

When the data points of Mythos and GPT-5.5 are added, the curve becomes steeper.

The benchmark test of the independent evaluation institution METR provides cross-validation.

They tracked the growth of AI capabilities from the perspective of software engineering tasks and calculated a doubling cycle of 4.2 months (starting from the o1-preview). If Mythos is also included, it shrinks to 4 months.

Two completely independent evaluation lines have met at the same order of magnitude.

In AISI's own words:

The doubling cycle of the duration of network tasks that cutting-edge models can independently complete is measured in months, not years.

Tokens are the ceiling, not intelligence

What's most disturbing in this report is not the numbers themselves but AISI's judgment on the bottleneck.

In the narrow testing suite, each task is limited to 2.5M tokens.

AISI clearly stated: This upper limit artificially depresses the success rate.

In the attack and defense shooting range experiment, the token upper limit was increased to 100M.

Mythos' performance immediately jumped to a new level.

This means that the current factor restricting AI's cyber attack capabilities is not the algorithm, not the depth of reasoning, and not the upper limit of intelligence - it's the token budget.

Given enough tokens, the model can go further.

AISI itself also admitted the limitations of the testing system:

The longest task is only 12 hours, so capabilities beyond this range cannot be measured; the human baseline data is limited; the agent scaffolding is too simple, artificially restricting the model's performance.

In other words, the real capabilities are likely to be higher than what was measured.

This is why the report's conclusion uses "doubling" instead of "approaching the ceiling".

They haven't seen the ceiling.

Evaluation is chasing, models are running

Let's take another look at Logan Graham's words.

The Mythos checkpoint used in the test went online a month ago.

AISI's evaluation report was published yesterday.

There's a whole month in between.

And in this month, Anthropic has probably iterated a new checkpoint.

By the time the security evaluation results are made public, the version being evaluated is already old.

This is not a problem unique to AISI.

The entire AI security evaluation field is facing the same structural problem:

The model iteration speed is systematically outpacing the security evaluation cycle.

When the evaluation results are released, it tells you what last month's model could do.

It can't tell you what the current model can do.

AISI used a very cautious statement in the report:

They're not sure if the leaps of Mythos and GPT-5.5 are isolated breakthroughs or a new, faster trend.

New variables in the final stage of AI models

Anthropic's Mythos and OpenAI's GPT-5.5 have both shown exponential growth in cyber attack and defense capabilities.

Mythos is one step ahead - 6/10 vs 3/10, and it's the only one to breach the Cooling Tower shooting range - but GPT-5.5 is also catching up quickly.

While these two are advancing rapidly in terms of capabilities, there's an ever-widening gap in the area of security governance.

In less than half a year, the capabilities double every 4.5 months.

This speed means that by the end of 2026, the complexity of network tasks that cutting-edge models can independently complete will be 4 to 8 times what it is now.

When an AI model can independently complete a penetration chain that a well-trained security expert takes 20 hours to finish without human intervention, every enterprise connected to the network globally should reevaluate its defenses.

Introduction to AISI

AISI is the world's first national-level cutting-edge AI risk assessment institution.

It was established at the Bletchley Summit in November 2023. In May 2024, its name was changed from Safety to Security, and it is now under the UK's DSIT.

AISI's main function is to conduct independent evaluations of cyber/biochemical/autonomous behavior/deception tendencies.

Most importantly, they have pre-deployment access to top model companies such as OpenAI, Anthropic, and DeepMind. That is to say, they are the first to access these most advanced models.

ASI, or Artificial Super Intelligence, refers to super artificial intelligence.

Reference materials:

https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber-capability-advancing

This article is from the WeChat official account