HomeArticle

OpenAI has also started to fear the new models it has trained.

新智元2026-04-13 09:07
Altman: You should also be afraid of my new model! 😨

Seeing the huge impact and high level of discussion brought about by the powerful cyber attack and defense capabilities of Anthropic's next - generation flagship model, Mythos, which is in internal testing, Sam Altman couldn't sit still and planned an internal test of OpenAI's AI with powerful cyber attack and defense capabilities.

On April 9th, Axios reported that OpenAI is preparing a product with strong cybersecurity capabilities, which will only be initially available to a small number of partners.

https://x.com/axios/status/2042244222292529371

What's really worth discussing behind this is not what new thing OpenAI is about to release.

It's another scarier thing: AI may have truly crossed the line in the field of cybersecurity.

If you've been following AI in the past few months, you'll notice a particularly subtle change.

Two years ago, when people talked about models, they discussed whether they could write copy, make PPTs, or write code.

Later, the discussion shifted to agents, automatic execution, and whether they could call tools on their own.

Now, the discussion has veered in another direction.

Can it find vulnerabilities on its own, reproduce them, and exploit them?

These questions sound like topics for an internal meeting in the security circle, not something ordinary people would care about.

But frankly, once the answer starts to lean towards "yes", this is no longer just a small piece of news in the hacker community.

This is a major event at the infrastructure level.

Because a vulnerability is not as simple as restarting your computer after a blue - screen error.

Vulnerabilities are connected to water treatment plants, power grids, hospitals, banks, browsers, operating systems, and cloud services.

In the past, these vulnerabilities were mainly discovered slowly by top - level security researchers, red teams, and national - level institutions.

Now, models are getting involved.

And it's not just about helping you complete a couple of lines of code. It's more like you give it a task at night, and when you wake up the next morning, it has already laid out the PoC, the exploitation chain, and repair suggestions on your desk.

Just a simple imitation of Anthropic Mythos'

PR strategy?

Recently, Anthropic also took an unusual action.

Instead of promoting a new model in a high - profile manner as usual, they included a model called Claude Mythos Preview in a closed - door program called Project Glasswing, making it only available to a few technology and security companies.

Anthropic's official reason is that this model is too powerful to be made public immediately.

Some of the test details released by Anthropic are really starting to feel like a cyber - thriller.

It can find high - risk vulnerabilities in large - scale open - source projects, turn vulnerabilities into exploitable attack chains, and even write complex exploits across multiple vulnerabilities.

Even more exaggerated, Anthropic's research team mentioned that even internal engineers without a formal security training background can ask it to find remote code execution vulnerabilities overnight, and wake up the next day to see a runnable exploitation result.

Can you believe it?

In the past, we always thought there was a wide gap between finding bugs and actually breaking through a system.

The former is more like a quality check, while the latter is closer to weaponization.

But now, this gap is being filled by models bit by bit.

So when you look back at OpenAI's action, you can understand the sense of tension in the air.

OpenAI actually laid the groundwork in February this year.

When they released GPT - 5.3 - Codex, they specifically launched a Trusted Access for Cyber program.

This program is by invitation only, and they promised to provide $10 million in API credits for institutions conducting legitimate defense research. OpenAI itself said that GPT - 5.3 - Codex is their most capable model in terms of cybersecurity to date, and it's the first time they've implemented a security stack at a high - capability level in the field of cybersecurity.

Put simply, the model has become so powerful that even they are starting to get nervous.

This is the most peculiar part of this round of changes.

AI companies are desperately creating more powerful models while also trying to restrict who can access these capabilities.

It's like an arms dealer suddenly realizing that what they're selling is no longer a knife but a missile that can find its own target.

Can we say they're wrong? Not really.

From the perspective of security companies, such restrictions are very reasonable. You can't just make something that can automatically find zero - day (0Day, referring to previously undisclosed vulnerabilities) and write exploitation chains as publicly available as a chatbot.

Especially since Anthropic itself has said that Mythos has found thousands of high - risk vulnerabilities in major operating systems, mainstream browsers, and key basic software.

It has even managed to break through the internal sandbox.

The feeling is not just that "the model has become smarter", but that "the social consequences of its capabilities are starting to spill over".

But on the other hand, the reality is also harsh.

Once something like this is proven to exist, it can't go back.

Rob Lee of SANS said, "You can't stop the model from doing code enumeration, nor can you stop it from finding defects in old code libraries because this ability already exists."

An executive from Palo Alto Networks has a similar view. If you block one model today, other models will catch up in a few weeks or months.

https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

This is what really sends shivers down your spine.

It's not that one company is too powerful.

Once this threshold is crossed, the entire industry will follow suit.

What's even more interesting is that many people in the security circle have started to use an old concept to understand this wave of changes: responsible disclosure.

The software world has been arguing for decades about whether to disclose a vulnerability immediately after discovery or to privately notify the manufacturer first, wait for the patch to be released, and then disclose it.

Now, the release of AI models is becoming more and more like this logic.

It's not about whether you can develop it, but about who to give it to first, when to give it, how to control the pace, and how to prevent it from falling into the wrong hands.

Isn't this a really strange slice of the era?

Before, we were afraid that AI was too stupid. Now, we're afraid that it's too good at its job.

What's most ironic is that what really brought this issue to the public's attention is not that hackers actually used it to break through a system, but that the model companies themselves got scared first.

It's like geologists start frantically reinforcing the floors before an earthquake even hits.

Reversed!

Speaking of this, there's an easily overlooked point.

Many people, after seeing Axios' initial headline, thought that OpenAI was going to let only a small circle test - use its yet - to - be - released flagship model, similar to how Anthropic is handling Mythos, and delay its public release.

But later, Axios corrected its statement after further verifying with OpenAI.

What OpenAI is going to make available to specific partners this time is not the yet - to - be - released new flagship model, Spud, but an independent cybersecurity product.

https://x.com/axios/status/2042244444724904190

This difference is quite important because it shows that OpenAI's current thinking may not be to lock up the entire next - generation general - purpose flagship model. Instead, it's more like packaging the most dangerous, sensitive, and boundary - pushing capabilities into a dedicated security product and only making it available to carefully selected defenders.

This actually says a lot.

AI companies are starting to accept the reality that in the future, the most cutting - edge model capabilities will not flow to everyone in the same way.

Some capabilities will be turned into mass - market products, some will be included in closed - door programs, and some will even remain in the hands of a few institutions for a long time.

The model is still the same, but the world will start to be stratified.

Ordinary users will get the layer that is user - friendly, smooth, and powerful enough.

Top - tier enterprises and security institutions will get the deeper, more dangerous, and more defensively valuable layer.

Even deeper, there may be internal capabilities that not even partners can access.

In a sense, this is a bit like nuclear proliferation control in the AI era, and also a bit like another version of "Folding Beijing".

On the surface, you see everyone using AI.

But what really determines the balance of attack and defense may be the capabilities that you can't see, touch, and that only flow within a small whitelist.

Is this a good thing?

In the short term, this will definitely make people uneasy and may even make technological power more concentrated.

But from a more practical perspective, at a time when models can find vulnerabilities and write exploitation chains on their own, it's better to give these capabilities to defenders first than to let everyone be vulnerable.

This also reminds us that the AI competition is no longer about whose chat is more natural or whose interface is smoother.

The real competition is moving towards the system's lower levels.

It's moving towards browsers, kernels, cloud platforms, and critical infrastructure.

It's moving towards places that people usually don't pay attention to but that will make the whole society tremble when something goes wrong.

The actions of OpenAI and Anthropic, one after the other, may be a signal.

AI is not just starting to replace humans in work.

AI is starting to enter the oldest and most sensitive game for humans: defense and attack, disclosure and blockade, openness and classification, efficiency and loss of control.

Reference materials:

https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

This article is from the WeChat official account “New Intelligence Yuan”. Author: New Intelligence Yuan, Editor: Allen. Republished by 36Kr with permission.