Mythical-level Claude 5 has reached the summit.
Just now, the new Agent Arena "Intelligent Agent Arena" list of Arena was released!
The mythical Claude Fable 5 topped the list at one go, defeating the former king GPT - 5.5 and its in - house peer Opus - 4.8.
Data shows that Fable 5 achieved an up - to - 11.2% "comprehensive net improvement".
This data set a record for the largest point difference in the history of this list!
What's even more unexpected is that in the Vals AI tri - party evaluation, Fable 5 ranked first in almost all evaluations.
Just 24 hours after its release, Claude Fable 5 is really terrifyingly powerful!
Claude Fable 5 is far ahead in the first place
Setting the largest point difference in AI history
On the Agent Arena list, five signals are used to score the models. Fable 5 has a terrifying lead in the two most important aspects.
One is the task confirmation success rate (18.2%); the other is the ratio of positive reviews to complaints (30.6%).
The following picture is enough to explain the overwhelming dominance of Fable 5.
In other words, in the two indicators closest to real - world work, "whether the work can be completed and whether the users are satisfied", Fable 5 has a cliff - like lead.
Looking at the single - item abilities, Fable 5 is also extremely powerful, directly taking the top spots on both the Code Arena and Text Arena lists.
Especially in coding, it won an astonishing 72% win - rate in front - end duels, and finally left others far behind with a terrifying 98 - point difference, performing a real dimensionality - reduction strike.
In addition, Fable 5 also ranks first in the tool hallucination item.
Moreover, various authoritative benchmarks are also strongly confirming its dominance.
In the Artificial Analysis intelligence index, Fable 5 scored 64.9 points and took the top spot, leading by more than 5 points.
Even more exaggerated is that on the GDPval - AA list that measures real - world work tasks, Elo soared above 1932, leaving Opus 4.8 far behind and reshaping the industry's limit level.
Powerful in front - end coding
Don't think it can only get high scores in "exams". Claude Fable 5 is also very powerful in actual operation. It truly lives up to its reputation.
Next, here is a hardcore visual problem: simulating the ablation of fluid ink.
This kind of dynamic is usually used to test the upper limit of a model. As a result, Fable 5 completed it in one go, cleanly and with full - fledged performance.
For another example, when asked to create a Windows system, unexpectedly, it directly produced a complete and usable web - based Windows system —
It has login, notifications, Edge browser, and FreeCell, etc.
It also comes with a Copilot, a Minecraft clone, visual gameplay, and several 3D worlds. This is not just creating a system; it's creating an ecosystem.
Moreover, just with one sentence, Fable 5 summoned The Elder Scrolls, the game of the year 2011.
Game studios can start packing up and going home.
Surprisingly, Claude Fable 5 (max) brought "Minecraft" into HTML, and the effect is incredibly good.
The blocks, the world, and the gameplay are all well - presented. It even added background music on its own.
Then, when asked to visualize the attention mechanism of a neural network and show how a small language model generates stories.
The result is truly amazing. It created a real, runnable model that is currently running in real - time in my browser through WebGPU.
The flow of attention and the generation of text are all presented in front of my eyes with particles and physics.
Hand - crafted a simulator in just 24 hours
In the Mechanize evaluation, Fable 5 also got the highest score of 74.5% in the GBA Eval.
Moreover, within 24 hours, it directly hand - crafted a game simulator that can run all games perfectly.
In less than 2 hours, its performance exceeded that of Opus 4.8.
Token volume soars to 205 billion, price doubles
As the first publicly available Mythos - level model, since its birth, the usage volume of Fable 5 has directly exceeded that of its in - house flagship.
Today, OpenRouter released the latest data —
Within 24 hours of its release, the daily Token processing volume of Fable 5 soared to about 205 billion, while that of Opus 4.8 was 147 billion.
More importantly, the price. Fable 5 is priced at $10 / $50 per million Tokens, exactly twice that of Opus 4.8.
With higher usage and double the unit price,
Ethan Mollick, a CS professor at the Wharton School, said bluntly that when Fable starts a workflow, Tokens are quickly consumed.
"Ability" starts to outrun "control"
Now, the release rhythm of Anthropic is not just about "a new model is out". It is clearly accelerating.
Looking at the release timeline of this year is scarier than any single - item benchmark.
It took 42 days from Opus 4.7 to Opus 4.8, while it only took 12 days from Opus 4.8 to Fable 5.
The intervals are shrinking, but the leaps are getting larger.
So, what we really should focus on is never the first place on a certain list, but how long this steep slope can last.
As the iteration interval of AI accelerates, the window for humans to learn to "tame" it is also narrowing at the same speed.
Reference materials:
https://x.com/arena/status/2064807170714358193?s=20
https://x.com/OpenRouter/status/2064788002606309723?s=20
This article is from the WeChat official account "New Intelligence Yuan", author: ASI Revelation, editor: Taozi. It is published by 36Kr with authorization.