HomeArticle

Has an AI model that surpasses Claude Mythos been born?

新智元2026-06-23 08:41
A Fable-class model that is immune to blockades

While everyone was still confused about the sudden disappearance of Claude Fable 5, Sakana AI made a high - profile announcement: Our Fugu is comparable to Fable and is not afraid of export controls.

Claude Fable 5, the most powerful model of Anthropic, has been globally banned, and it's unknown when it will be unblocked.

Previously, OpenRouter launched the Fusion feature, achieving Fable - level intelligence at half the price.

Just now, Sakana AI, an AI company co - founded by Llion Jones, the inventor of Transformer, has also learned this trick.

They claim that the performance of the Fugu Ultra model is comparable to that of Claude Fable and Mythos.

In Fugu's statement, "cutting - edge capabilities without the risk of export controls" is clearly stated as the core selling point, full of sarcasm.

In the industry's strictest engineering, science, and reasoning benchmark tests, Fugu keeps pace with leading models such as Fable and Mythos.

A new trend is coming: Anthropic's Fable 5 has become the new benchmark.

The era of large models is passing, and scheduling will reign supreme

Mythos was released on June 9th and taken offline on June 12th, only surviving for 72 hours.

This is the first time the US government has used its export control authority to force an already - deployed cutting - edge AI model offline: It not only restricts overseas users but also foreign citizens within the US, and even foreign employees of Anthropic itself.

Access rights can disappear in an instant and be cut off overnight.

This has stunned all Claude Fable 5 users.

Now, the ability to schedule models is not just a technical optimization but a necessity.

Following OpenRouter, Sakana AI is moving towards the next frontier of AI, breaking the hegemony of large models.

Collective intelligence is the most practical hedge against this excessive concentration of power.

The core advantage of Fugu lies in:

It orchestrates an entire pool of freely switchable AI agents behind it. When faced with restrictions from a single supplier, it can automatically bypass and switch to another model to continue running, greatly improving the system's resilience.

Sakana Fugu itself is a language model. After training, it can call various large language models in the agent pool — including calling its own instances recursively.

Fugu will dynamically orchestrate the world's top models to complete complex multi - step tasks.

In the future, it's not about whose model is larger, but about who can "orchestrate" the world's models better, more stably, and more autonomously.

The intelligence ceiling of a single model is still important, and the Scaling Law is still important, but the era of Orchestration Models has arrived.

Surpassing Mythos without brute - force computing power

Fugu is not an AI model in the traditional sense.

It is a multi - agent system that utilizes and amplifies their different skill sets, pointing out a new path for the leap in AI capabilities.

Fugu is on par with Fable 5 and Mythos Preview.

In agent programming and software engineering, Fugu - Ultra performs particularly well. It has reached the current optimal level on both the SWE Bench Pro and Terminal Bench 2.1 test benchmarks, and its performance has significantly improved.

On these two benchmarks, Fugu - Ultra has improved by 5%–6% compared to the second - best model.

This is comparable to the improvement brought about by a major version upgrade of these cutting - edge model suppliers.

In scientific reasoning, the Sakana Fugu model also shows a significant improvement in capabilities, even surpassing Mythos Preview and Fable 5.

This discovery further confirms one of the core motivations for training Sakana Fugu:

Intelligent scheduling should become another dimension for improving performance without relying on increasing training computing power.

A hallmark feature of an intelligent scheduler is adaptability.

Throughout the evaluation process, the Fugu model shows continuous and diverse adaptability in its routing allocation.

This indicates that the new model can accurately learn the different skills of each member in the model team and call the corresponding models according to these professional specialties.

To supplement the overall benchmark test scores, the Sakana Fugu model compared three cutting - edge baseline models: Gemini 3.1 Pro (high), Opus 4.8 (max), and GPT 5.5 (xhigh) (randomly anonymized as Model A, Model B, and Model C).

These tasks emphasize real - world agent behavior, such as long - term research, program synthesis, optimization, CAD generation, etc.

How can one model command all intelligence?

Fugu will dynamically schedule the world's top models to tackle complex multi - step tasks.

With just one API, you can directly integrate collective intelligence into your workflow today.

At launch, Sakana Fugu will offer two models and can be accessed via an OpenAI - compatible API:

• Fugu combines powerful performance and low latency in daily work.

• Fugu Ultra is the flagship model, specifically optimized for difficult multi - step problems, aiming for the highest answer quality.

Fugu Ultra will coordinate a deeper and more powerful pool of expert agents to handle high - demand work such as AI research, network security analysis, and patent investigations.

Sakana Fugu is mainly based on two ICLR 2026 papers.

For each task, Fugu has learned to assemble, schedule, and coordinate multiple expert models on its own without relying on human - designed workflows.

Fugu uses a pre - trained language model as the backbone and coordinates the pool of working models based on its own hidden states.

Fugu runs a lightweight selection head in parallel with the language model head of the base model.

It receives a hidden state ℎ from the backbone network as input and outputs 𝐿 logits for each working model in the pool.

The selected model is always called as the working model, which reduces the coordination space and lowers the orchestration latency.

The research team also fine - tunes the singular value scale of the selected parameter matrices in the LM layer, which are indicated by the red diagonals.

In the above figure, the hidden state at the position marked <Head Input> is the input for the lightweight head.

It should be noted that the lightweight head operates on the internal hidden states rather than the final decoded text.

Fugu adopts a two - stage method for training.

First, large - scale supervised fine - tuning (SFT) is carried out.

This stage brings together many single - step tasks, covering programming, mathematics, reasoning, language understanding, and various agent usage scenarios.

After supervised fine - tuning on single - step tasks, Fugu is further optimized by applying evolutionary strategies on end - to - end tasks.

From different coding assistant environments (such as Claude Code, Codex, and OpenCode), they collected real - world multi - round trajectories and constructed end - to - end tasks involving repository context, iterative editing, tool calls, execution feedback, and final task completion.

This stage expands the training distribution from static problems to better reflect the agent workflows in production use.

Conductor is trained with reinforcement learning. It can figure out how to use natural language to coordinate the cooperation between different models, including arranging their communication and designing more precise prompts. In this way, multiple large models working together are better at handling high - difficulty reasoning questions than using any single model alone.

Perhaps the most important thing this