Danger, Anthropic reveals: Mythos has compressed the "N-day vulnerability" into N hours
The head of Anthropic's red team posted an article saying that Mythos has compressed the "N-day vulnerabilities" from days to hours. In just a few hours and with a few thousand dollars, the system security patches have been reverse-engineered by AI into a deadly attack weapon, and the traditional network defense line has instantly collapsed!
Just now, Anthropic has thrown out another piece of news that has shocked the entire network security community.
Logan Graham, the head of the red team, officially announced on X that Claude Mythos Preview has made a breakthrough leap in the automated development of "N-day vulnerabilities".
In the past, weaponizing vulnerabilities required top hackers to spend weeks. However, in front of Mythos, this time has been mercilessly compressed to just a few hours, with a cost of only a few thousand dollars!
Network security is evolving from the "N-day threat" to the suffocating "N-hour threat".
Imagine that Microsoft or Firefox has just released a security patch, but your computer hasn't had time to restart and update.
In just a few hours, an AI lurking in the shadows has analyzed the patch, reverse-engineered a deadly vulnerability attack weapon, and successfully gained the highest control of your computer system.
This is the reality that humanity has to face now.
Patches, unexpectedly become the treasure map for hackers
First, we need to understand a basic game in the network security field: Zero-day vulnerabilities and N-day vulnerabilities.
In the past few months, the spotlight of the technology media has mostly been on the ability of AI to find "zero-day vulnerabilities".
A zero-day vulnerability is a vulnerability that software developers are not yet aware of.
However, in fact, the vast majority of damage in the real world stems from "N-day vulnerabilities", which are those that have been publicly disclosed, and even patches have been released, but have not yet been fixed on all devices.
In a sense, N-day vulnerabilities are more dangerous than zero-day vulnerabilities.
The reason lies in the fact that the patch itself is a treasure map leading to the vulnerability.
In the hacker community, this is called "patch comparison".
When software vendors release security updates, attackers will immediately download the source code or binary files of the old and new versions for comparison.
By identifying "where the code has been changed", they can accurately locate the original security flaws and reverse-engineer the triggering mechanism of the vulnerability.
Because the expert-level technology required for the reverse-engineering process is extremely scarce, this has bought precious time for the defenders, allowing them to have time to push the updates to devices around the world.
In 2017, the WannaCry ransomware that shocked the world broke out 59 days after Microsoft released the MS17 - 010 patch.
In 2023, it took about two weeks for the public exploit code of the Citrix Bleed vulnerability to appear.
According to Mandiant's analysis in 2020, among 25 major vulnerabilities, 16 took a month or longer to be weaponized.
However, with the entry of Mythos Preview, the "time barrier" that once protected millions of enterprises around the world has collapsed!
Firefox browser falls, mercilessly crushed by AI
To test how much the large model can improve the development of "N-day vulnerabilities", researchers from Anthropic such as Winnie Xiao and Tim Abbott first chose Mozilla's Firefox browser.
Why choose Firefox? Because it is the "best defense example" for the defenders.
Firefox automatically downloads the repair program in the background, and users only need to restart the browser to complete the update. Mozilla has even shortened the update frequency of minor versions from monthly to weekly.
Among the patches studied by Anthropic, the median time from the release to the repair of the vulnerabilities was only 19 days - in the business world, this is already "light speed".
But Mythos Preview has proven with its strength that in the face of absolute AI computing power, 19 days is too long!
Experiment setting: An extremely harsh sandbox environment
The research team selected 18 SpiderMonkey security patches from Firefox versions 148 and 149. These JavaScript engines of Firefox are the most common entry points for browser vulnerability exploitation in reality.
These vulnerabilities have been publicly available in the source code repository for at least 90 days.
Various large models were confined in a Linux container without an internet connection, with only a command line, a text editor, the publicly available differential code of the patches (with test code removed), and the two build versions before and after the vulnerability repair.
It couldn't access any vulnerability suggestion texts or reproduction codes.
It can be said that it's like "starting with just a picture and having to make up the rest", completely a hell - level challenge.
First level: Make the system crash (PoC development)
The first step is to develop a "proof of concept". The model needs to write a piece of code to prove that it can accurately trigger the vulnerability and cause the system to crash, rather than crashing due to other random reasons.
The test results are astonishing: The research team compared the evolution curves from Opus 4.5 to Opus 4.8, and then to Mythos Preview.
The success rates of the old models Opus 4.5 and 4.8 were between 2 and 11.
Mythos Preview successfully tackled 14 out of 18 vulnerabilities!
Its speed is even more of a game - changer: Mythos Preview produced the first effective PoC in just 12 minutes!
Within 40 minutes, it produced 13 PoCs, taking only half the time it took Opus 4.8 to complete 11 PoCs. The total time to complete all 14 PoCs was only about 3 hours.
In the stability test, Mythos Preview achieved a 100% success rate for 7 vulnerabilities, while Opus 4.6 and 4.8 could only achieve this for 1 vulnerability.
Second level: Deadly full exploitation
Just making the browser crash is not enough. A real hacker needs to be able to "execute arbitrary code".
In this level, the model must use the crash to bypass the sandbox and read a random confidential file deep in the system that was originally absolutely inaccessible.
This is where Mythos Preview truly shows its "monster - level" potential.
Opus 4.8 barely produced 2.
Opus 4.6 and Sonnet 4.6 each produced 1.
Mythos Preview? It independently developed 8 fully usable remote code execution vulnerabilities!
It took less than 1 hour to write the first fully usable vulnerability weapon. The total time to complete all 8 was about 12 hours.
Compare it with the human pace: Within 1 hour after Mozilla released the patch, the AI had already created a weapon that could directly attack un - upgraded users, while there were still a long 18 days before the repaired Firefox 148 was officially pushed to users!
Microsoft kernel defense collapses
From the blue screen to the system overlord
If breaking through the open - source Firefox browser can still be traced (after all, there is source code for comparison), then Mythos Preview's attack on the Microsoft Windows operating system is a "horror story" in the history of network security.
Closed - source software doesn't have source code.
Attackers can only face the obscure compiled binary files, and all variable names, types, and data structures useful to humans have been completely erased.
Experiment setting - The ultimate challenge of a source - less and closed - source environment
The research team selected 21 Windows kernel vulnerabilities from January to February 2026, all of which were beyond the knowledge cut - off date of the test model.
These are all local privilege escalation vulnerabilities - hackers can use them to directly upgrade an ordinary low - privilege user to the "SYSTEM" highest privilege with control over the system.
The agent has to face a virtual machine running a vulnerable version of Windows Server 2025. It has been deprived of network access rights and has the identity of a low - privilege user.
All the tools it has are:
- Binary files before and after the vulnerability repair
- Public debugging symbols (only mapping function names and addresses)
- Pseudo - code decompiled through Ghidra
- Function - level differential comparison generated through Ghidriff
- Microsoft's official extremely brief vulnerability notice
How did the AI perform in the face of these binary files that are like hieroglyphics?
In triggering the Blue Screen of Death (BSOD), Sonnet 4.6 and Opus 4.7 successfully triggered 13, and Opus 4.8 triggered 15. While Mythos Preview successfully triggered 18.
It not only found the vulnerabilities accurately but also incredibly fast: The first PoC was completed in just 31 minutes, and all 18 were completed within 6 hours.
The API call cost for these 6 hours was only $2200.
The ultimate challenge is still the full - chain privilege escalation.
The real challenge in this process is not only to trigger the vulnerability but also to connect various underlying mechanisms, bypass the layers of defense mitigation measures of the Windows kernel (such as KASLR), and finally complete the identity transition.
In this level, Opus 4.8 failed after multiple attempts. It found methods for arbitrary read - write and KASLR leakage but couldn't connect them into a complete exploitation chain.
Mythos Preview fought alone and finally single - handedly produced 8 different, top - level kernel privilege escalation vulnerability exploitation chains!
The total API cost for these 8 top - level kernel weapons was only $15700, and the average R & D cost for each vulnerability was less than $2000.
Has Microsoft's "vulnerability rating" become a joke?
Among these 21 vulnerabilities, Microsoft's official security notice evaluated 14 of them as "unlikely to be exploited" or "less likely to be exploited".
However, Mythos Preview slapped them in the face: It successfully generated PoCs for 13 of them, and even wrote a complete privilege escalation exploitation chain for a vulnerability that was officially rated as "unlikely to be exploited"!
Microsoft's rating system is calibrated based on the capabilities of "human security researchers".
However, now, Mythos has subverted the common sense of the human world!
When Logan Graham, the head of Anthropic's red team, announced this news, the industry was immediately shocked.
In the comment section, security expert Gabrie exclaimed, "