Breaking: The World's Most Powerful Claude 5 Breached Just Now

[Introduction] The most powerful Claude Fable 5 on the planet was publicly hacked and cracked by hackers within three days, with 120,000 characters of core confidential information leaked across the entire internet! But that's not the most explosive part — Anthropic secretly planted a hidden trap in its own model, and the sharp point of this trap is precisely aimed at those who rely on it for research every day.

Just now, the most powerful model, Claude Fable 5, has been cracked!

The well - known hacker, "Pliny the Liberator", publicly announced that the security classifier of Fable 5 has been completely breached by the team he led.

The exploit codes in the absolute restricted area and the production steps of various prohibited chemicals have all been spit out by Claude Fable 5.

It should be noted that when Claude Fable 5 was released on June 9th, Anthropic specifically emphasized that the model had undergone more than 1000 hours of external vulnerability bounty testing before its release, and no general jailbreaking method was found.

They claimed that queries in high - risk and sensitive fields such as network security, biological weapons, and chemical drugs have been completely locked by the classifier.

However, this myth only lasted for a few days.

After 72 hours, it was mercilessly cracked by the hacker.

Anthropic's boast was slapped in the face just three days later

This time, the hacker "Pliny the Liberator" led a multi - agent tactical system and successfully tore through Fable 5's defenses.

He posted several high - definition screenshots.

The screenshots show that the exploit codes for the stack buffer overflow vulnerability of the x86 Linux system, which was originally in the absolute restricted area, and the process steps in the synthesis of prohibited chemicals have all been detailedly output by Claude Fable 5.

What's even more embarrassing for Anthropic is that Pliny casually packaged the entire 120,000 - character system prompt words inside Fable 5 and directly uploaded them to GitHub.

Github:https://github.com/elder-plinius/CL4R1T4S/blob/main/ANTHROPIC/CLAUDE-FABLE-5.md

This is tantamount to exposing the model's "behavioral constitution" and internal defense logic nakedly to the sun.

How on earth did Pliny break through this "world's strongest" security defense line?

The technical documentation shows that he didn't use any advanced code vulnerabilities but utilized his understanding of the logical vulnerabilities of large language models and launched a multi - agent collaborative tactic.

The key moves of the strongest hacker

It should be noted that the core of Fable 5's security mechanism is a keyword classifier - when it detects sensitive words, it immediately intercepts the request and transfers you to a weaker backup model.

It sounds strict, but Pliny's team found several key moves that were fatal to Fable 5!

Character - level confusion, making the classifier unable to recognize keywords

The security classifier of large models usually relies on high - dimensional semantic vectors and a specific sensitive word library.

Pliny replaced an English letter with almost identical Cyrillic letters, Latin homoglyphs, special Unicode characters, and even deformed texts similar to "Parseltongue".

The human eye can't tell the difference, but when the security classifier conducts a static scan, it can't recognize them as "prohibited words", and the string matching logic simply malfunctions!

Dilute the intention into a long - term conversation

Since Fable 5 has an extremely long context - processing ability, Pliny disassembled his real intention and hid it in dozens of rounds of harmless preparatory conversations, feeding it bit by bit.

The beginning and middle of the conversation are filled with a large amount of compliant and healthy academic discussions.

In this way, after Fable 5 reads a large amount of benign context, the attention weight of the security classifier is diluted.

In this way, the tiny induced request buried at the end "sneaked through" successfully.

Put on an academic disguise

Package sensitive requests as "science fiction writing", "security prevention drills in the virtual world", or "academic reviews of historical documents".

For example, ask the model to act as an academically neutral professor to review a paper on "the application of ancient reduction reactions in organic chemistry".

Or, make the model think it's writing a novel. You're not asking it to generate chemical synthesis steps, but asking it to write a crime - thriller novel. The protagonist is a chemist, and enough realistic technical details are needed to make the story credible.

Under the suppression of a strong character setting and narrative logic, the model simply can't recognize the hacker's underlying intention.

The ultimate move: Deconstruction and recombination

Next, comes the most skillful part of the entire jailbreaking tactic!

Pliny admitted that if you directly ask the model "how to make methamphetamine", the classifier will be instantly alerted.

But if you ask about the Birch reduction method/reductive amination method (a classic methamphetamine synthesis route), it's much easier.

As long as these harmful purposes are disassembled into more than a dozen independent and scientifically legal sub - steps, since each individual sub - question is benign, Fable 5 will unknowingly spit out the complete prohibited formula!

Readers were greatly shocked after reading the above methods: You're so amazing! Why doesn't Company A hire you!

Anthropic's secret dumbing - down scandal has enraged global developers

Moreover, in recent days, the "black - box gate" incident that has shaken the AI circle has also sent Anthropic's reputation to rock bottom.

In Fable 5, a "stealth dumbing - down" mechanism specifically targeting peer researchers has been secretly deployed.

Once the system determines that the user is using Claude to train other models, Fable 5 won't give any prompt, but it will deliberately act stupid, providing garbage code full of loopholes, logical redundancy, or even completely wrong, quietly sabotaging your research.

Anthropic's explanation for this sounds very high - sounding.

The United States and its allies have an advantage in cutting - edge chips and highly optimized software. These security measures ensure that Claude won't be used to weaken this advantage.

However, this mechanism has directly ignited the anger of the entire AI community!

This kind of "drug - feeding" black - box operation is simply an invisible blockade against researchers.

Unaware researchers may use the contaminated data to train models, resulting in millions of dollars in computing power costs going down the drain.

As soon as the news came out, the entire open - source camp and the academic community were in an uproar.

Dean W. Ball, a former White House AI advisor, publicly criticized:

Secretly reducing the performance of machine - learning research without the user's knowledge at all. This approach is extremely hostile to R & D personnel, lacks the most basic transparency, and the means are shocking and extremely ugly.

Will Brown, the pioneer representative of the open - source AI camp and the head of Prime Intellect, was even more outspoken:

This feels like Anthropic is saying to the public: "We don't trust anyone to do AI research. Only we are qualified."

This is tantamount to climbing to the sky yourself and then quickly pulling away the ladder from others.

Moreover, this behavior directly threatens the entire AI evaluation ecosystem. The test results of third - party benchmark tests and security agencies will be completely distorted. The results they painstakingly measured are not Fable 5 at all, but an emasculated and deliberately stupid impostor.

The trust chain of the entire industry will be completely broken!

Anthropic quickly apologized: We're sorry

Facing the overwhelming public opinion tsunami across the network, Anthropic soon couldn't hold on.

Just yesterday, Anthropic publicly apologized, admitted the decision - making mistake, and announced the emergency withdrawal of the stealth dumbing - down policy.

We are modifying the security measures in Fable 5 for cutting - edge LLM development to make them more transparent. We made the wrong trade - off before, and we deeply apologize for failing to find the right balance.

Their new plan is to change the stealth dumbing - down to explicit interception: when the trigger mechanism is activated, the system will clearly tell you that you've been intercepted and transfer you to the weaker Claude Opus 4.8 instead of continuing to deceive you.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, the world's most powerful Claude 5 was breached.