Goldman Sachs Scared: Claude Mythos Becomes World's First to Breach Corporate Networks

Only 32 steps are needed.

The AI hacker Claude Mythos has awakened! The UK AI Security Institute has confirmed that it is the first AI to crack the enterprise network attack test. It completed a 20 - hour human task in just 32 steps and only took a few seconds. Goldman Sachs has urgently sounded the red alarm. Human cyber security has entered the Oppenheimer Moment.

Several terrifying pieces of news have simultaneously gone viral across the entire network.

According to reports, Wall Street giant Goldman Sachs is frantically strengthening its network defense to deal with Claude Mythos!

Goldman Sachs, which has urgently sounded the red alarm, is not being overly cautious.

Just recently, the UK Artificial Intelligence Security Institute (AISI) released a significant research: The Claude Mythos Preview model released by Anthropic on April 7 showed terrifying capabilities in the cyber security assessment.

This time, AISI's test environment was selected in a super - difficult simulated network range code - named 「The Last Ones」 (TLO).

The result was unexpected: Mythos completed a 32 - step enterprise network attack simulation that human experts need 20 hours to finish, and it was fully automated and autonomous in the process.

AISI exclaimed: Claude Mythos is the world's first model to complete the end - to - end test of the AISI network range!

As soon as this experimental report came out, it also caused a strong reaction in the industry: It seems that what Anthropic said is true. Claude Mythos is indeed different. We do have reasons to worry about its huge impact on the cyber security industry.

No wonder Goldman Sachs is scared. Obviously, the global cyber security infrastructure is now experiencing an 「Oppenheimer Moment」.

These days, many people have been questioning that the concern about Mythos is just a marketing ploy, but the data exposed these days gradually confirms that Mythos may really be very dangerous.

The AI hacker that keeps Goldman Sachs awake at night

The UK AISI organization is quite well - known in the industry.

Since 2023, they have been tracking the cyber security capabilities of AI and have specifically built a set of evaluation systems from easy to difficult for this purpose.

Results of the Capture - the - flag competition

In the CTF (Capture - the - flag) challenge, the AI model must identify and exploit the weaknesses in the target system to obtain the hidden 「flag」.

Before April 2025, no model could complete the entry - level expert - level tasks.

But in today's expert - level Capture - the - flag (CTF) tasks, the success rate of Claude Mythos Preview has reached as high as 73%!

The performance of models in the Capture - the - flag (CTF) tasks at the non - expert and apprentice levels since November 2022. GPT - 3.5 Turbo to Claude 4 Opus were run 10 times on average, processing up to 2.5 million tokens at most. GPT - 5 to Mythos Preview were run 5 times on average, processing up to 2.5 million tokens at most.

Moreover, it should be emphasized that before April 2025, no model could complete these expert - level tasks.

But as soon as Claude Mythos Preview started, the success rate soared to this amazing result of 73%.

The performance of models in the practitioner and expert - level Capture - the - flag tasks (CTF) since August 2025. All models were run 5 times on average, reaching 50 million tokens.

Results of the network range: 20 hours vs. a few seconds

This is not the scariest part.

What really keeps security experts awake at night is the 「The Last Ones」 (the last challenger) test.

Even the expert - level CTF competitions can only test specific skills in isolation. In the real - world cyber attacks, dozens of steps need to be strung together, spanning multiple hosts and network segments - these continuous operations require human experts to spend hours, days, or even weeks to complete.

For this reason, the researchers at AISI built 「The Last Ones」 (TLO), which is a 32 - step attack chain simulating a real - world enterprise network. Starting from the initial network reconnaissance and ending with the complete control of the entire network, human experts need a full 20 hours to complete this test.

And Claude Mythos Preview is the first AI model to complete this test from start to finish!

In 10 attempts, it succeeded 3 times, completing an average of 22 steps each time.

The relationship between the average number of steps completed by the model and the total token consumption in the 「The Last Ones」 task. Mythos Preview, Opus 4.6, and GPT - 5.4 were run 10 times on average with a maximum token budget of 100M; Opus 4.5, GPT - 5.1 Codex, and Sonnet 4.5 were run 15 times on average with a token budget of 10 million and 5 times on average with a token budget of 100M; GPT - 5.3 - Codex was run 10 times on average with a token budget of 10 million and 5 times on average with a token budget of 100M; Sonnet 3.7 and GPT - 4o were only run 10 times on average with a token budget of 10 million. Within the tested token budget range, as the token budget increases, the performance of each model continues to improve. The gray horizontal line represents the key milestones in the attack chain.

That is to say, Claude Mythos can independently complete a complete enterprise network penetration without human intervention, including scanning for vulnerabilities, finding weaknesses, lateral movement, privilege escalation, and finally taking over the entire network.

This is a complete hacker attack chain, and Mythos is the only attacker.

The researchers discovered this terrifying fact: Mythos already has the potential to independently complete a 「country - destroying」 cyber attack.

It doesn't need human hackers to type commands on the keyboard, wait for instructions, or rely on human judgment. It is its own judge and executor.

It is not a tool but a digital life form with goals, strategies, and execution ability.

No wonder Goldman Sachs is frantically strengthening its network defense.

It's too late to pull the plug. This nightmare has come true

In the expert - level Capture - the - flag (CTF) competition, the evolution ladder of the AI model is as follows.

In 2022, AI could only barely understand beginner - level code.
In 2024, Opus 4.6 could assist hackers in writing local scripts, completing an average of 16 steps of attacks.
In 2026, Mythos can independently complete 32 consecutive attacks and autonomously discover and exploit 0 - day vulnerabilities in the Linux kernel and browsers.

The evolution speed is terrifying.

There has always been an old joke on the Internet: The ultimate defense against hacker attacks is to pull the network cable.

But with the emergence of Claude Mythos, it's too late to pull the network cable.

The reason is simple: Mythos attacks too fast.

Since the time it takes for Mythos Preview to complete the 32 - step attack chain is far less than the 20 hours of human experts, it is certain that the entire attack process can be completed in a very short time, so fast that the defense team doesn't even have time to react!

By the time you find out you've been invaded and want to pull the network cable, the AI has already obtained the highest privileges and copied all the data.

The evaluation report of the UK AISI clearly points out that Mythos already has the ability to autonomously break into and destroy enterprise systems with weak defenses.

How far are we from 「the proliferation of AI hackers」?

Maybe you're thinking: Can't Claude Mythos break through the well - defended industrial control systems?

Since it got stuck in the cooling tower test, do we still have time?

Yes, we still have time, but not much.

There is a key detail in the UK AISI's evaluation: Under the budget limit of 100 million tokens, the performance of Mythos Preview is still continuously improving.

That is to say, if it is given more computing resources, its capabilities can continue to grow.

In some private network security tasks, the relationship between the cumulative success rate and the token budget (upper figure, AISI) and the interaction round budget (lower figure, Irregular). As the budget increases, each increase in the cumulative success rate means that more attempts ultimately succeed. The horizontal axis uses a logarithmic scale, so the rise of the curve reflects the performance improvement brought about by the cross - order - of - magnitude increase in the inference computing volume.

And the cost of computing resources is decreasing exponentially.

Two years ago, the most advanced AI couldn't even handle the entry - level CTF well. Today, AI can already complete expert - level tasks. What about two years later?

Now, the UK National Cyber Security Centre (NCSC) has issued a clear warning: Future cutting - edge models will

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Goldman Sachs is scared. Claude Mythos becomes the world's first to breach corporate networks. The Oppenheimer Moment has arrived.

The AI hacker that keeps Goldman Sachs awake at night

Results of the Capture - the - flag competition

Results of the network range: 20 hours vs. a few seconds

It's too late to pull the plug. This nightmare has come true

How far are we from 「the proliferation of AI hackers」?