StartseiteArtikel

Gerade hat Anthropic Sonnet 5 veröffentlicht, dessen Leistung nahe an Opus 4.8 liegt, aber nicht unbedingt günstiger ist

机器之心2026-07-01 08:19
Das bisherige Sonnet-Modell mit den stärksten Agent-Eigenschaften!

Just now, Anthropic officially presented the new model Claude Sonnet 5 and described it as "the most agentic Sonnet model to date". It can create plans, use browsers, terminals, and other tools, and work autonomously with a performance that only larger and more expensive models could achieve a few months ago.

The Sonnet 5 shows a significant performance improvement in inference, tool usage, programming, and knowledge work compared to the Sonnet 4.6 and comes closer to the Opus 4.8, but it is more cost - effective.

Anthropic's official statement says that the era of AI agents has begun with the Sonnet models: Claude Sonnet 3.5, 3.6, and 3.7 were the first models that showed impressive abilities in programming and tool usage. However, recently, the agent ability has mainly improved in the Opus models.

The Claude Sonnet 5 clearly closes this gap: Its performance has come close to that of the Opus 4.8, but it is more cost - effective. Compared to its predecessor model, the Sonnet 4.6, it has achieved significant improvements in the key areas of agent performance such as inference, tool usage, programming, and knowledge work. The exact comparisons are shown in the following figure:

The following figure compares the performance of Sonnet 5, Sonnet 4.6, and Opus 4.8 in the agent search tests BrowseComp and in the computer usage test OSWorld - Verified at different "levels of effort":

  • The Sonnet 5 (orange line) shows a significant performance improvement compared to the Sonnet 4.6 (gray line) and offers a wider range of cost - performance options than the Opus 4.8 (yellow line).
  • At a medium level of effort, the Sonnet 5 significantly improves cost - efficiency; at a higher level of effort, it can achieve the performance of the Opus 4.8 in some tasks.
  • Between the Sonnet 5 and the Opus 4.8, users can flexibly adjust the level of effort according to the task to find the optimal cost - performance ratio for their needs.

The cost - performance curves at different levels of effort are shown in the above figure. The previous best Sonnet model (Sonnet 4.6) was far behind the Opus 4.8. The Sonnet 5 offers a wider range of cost - performance options than the Sonnet 4.6 and can, in some cases, achieve the capabilities of the Opus 4.8. The prices for the Sonnet 5 in the figure are $3 per million tokens for input and $15 per million tokens for output. With the trial price until August 31st ($2 per million tokens for input and $10 per million tokens for output), the actual cost of the Sonnet 5 is even lower than shown in the figure. The Opus 4.8 costs $5 per million tokens for input and $25 per million tokens for output.

The feedback from Anthropic's early - access partners is consistent: The Sonnet 5 is more agentic than its predecessor models. Testers report that it can handle complex tasks where previous Sonnet models failed; it automatically checks its outputs without explicit instructions; and it performs all these agent tasks at a very attractive price:

Security Assessment

Anthropic's security assessment before deployment showed that the Sonnet 5 is generally improved compared to the Sonnet 4.6. In terms of the security of autonomous agents, it performs better in rejecting malicious requests and defending against attempts to be controlled through prompt - injection attacks. The hallucination rate and the rate of obsequious behavior are lower than those of the Sonnet 4.6. In the automated behavior test (which tests a wide range of misbehaviors, such as supporting abuse and deception), the Sonnet 5 has a lower score (i.e., it is safer).

However, in this assessment, it shows a slightly higher rate of misbehavior compared to the more powerful models Opus 4.8 and Claude Mythos Preview.

The above figure shows the rate of misbehavior in the automated behavior test, which tests a variety of undesirable behaviors in different scenarios and backgrounds (the full list and the results for each behavior can be found in Section 6.4 of the Sonnet 5 system card). The rate of misbehavior of Sonnet 5 is generally lower than that of Sonnet 4.6, but higher than that of Mythos Preview and Opus 4.8.

Anthropic explains that they did not specifically train Sonnet 5 for network security tasks. It can perform some normal, harmless network tasks, but in the assessment of potentially dangerous network capabilities (e.g., the development of software vulnerability exploits), it is significantly behind models such as Opus 4.8 and Mythos 5.

The following figure shows the score of an assessment that tests the model's ability to develop exploits for security vulnerabilities in the Firefox browser. The Sonnet 5 has never developed a fully functional exploit, but its partial success rate is slightly higher than that of the Sonnet 4.6. This improvement could be based on the improvement of general intelligence, not on specific training.

The above figure shows the score of the successful development of exploits for software vulnerabilities in Firefox 147 (this assessment was developed in cooperation with Mozilla; all vulnerabilities were fixed in Firefox 148). For each model, the left column shows the frequency with which the model (without security barriers) has developed a working exploit, and the right column shows the partial success rate. Both Sonnet models have not developed a working exploit (both have a score of 0.0%); the partial success rate of Sonnet 5 is slightly higher than that of Sonnet 4.6. The network capabilities of both Sonnet models are significantly weaker than those of Opus 4.8 and Mythos 5.

Since the Sonnet 5 is slightly stronger than its predecessor in these tasks, Anthropic has activated network security barriers by default. These barriers - which can detect and block dangerous network activities in real - time - are the same as in Claude Opus 4.7 and 4.8 (since Anthropic assesses the overall network security risk of Sonnet 5 as low, the barriers are less strict than those in Fable 5, which block a wider range of network security tasks).

The full assessment report from Anthropic on Sonnet 5 in terms of security and performance can be found in the "Claude Sonnet 5 System Card".

Pricing

As of today, Claude Sonnet 5 is officially available in all channels. To celebrate the release, Anthropic is offering a limited - time introductory offer:

  • From now until August 31, 2026: $2 per million tokens for input and $10 per million tokens for output
  • After that, the standard price of $3 per million tokens for input and $15 per million tokens for output will be restored

At the same time, they announced that they will increase the rate limits for Chat, Cowork, Claude Code, and the Claude platform overall to accommodate the higher token consumption in the "higher - effort" mode.

Important Notes

Network Security Review

The Sonnet 5 is included in Anthropic's "Network Security Review Program". This program is now available on the following platforms:

  • Claude Native Platform
  • Claude Platform on AWS
  • Claude in Microsoft Foundry (hosted on Azure and Anthropic)

Claude on Google Vertex will also be supported soon.

Organizations that are already participating in this program will automatically get the same access to Sonnet 5 and do not need to apply again. If your network security work requires less strict security barriers, Anthropic recommends using Claude Opus 4.8.

Tokenizer Update and Pricing Information

The Sonnet 5 is an improvement over the Sonnet 4.6, but it uses a new tokenizer to optimize text processing performance (similar to the tokenizer change in Claude Opus 4.7).

This means: The same input will now be translated into more tokens, and the increase is between 1.0 and 1.35 times, depending on the content type.

Therefore, Anthropic has set the trial price to ensure that the overall cost for users remains approximately the same when transitioning to Sonnet 5.

Rate Limit Adjustment

As early as April 26, 2026, Anthropic increased the rate limits for the Sonnet and Haiku models at all usage levels and simplified the packages on the Claude Native Platform into three levels: Start, Build, Scale.

In this update, Anthropic further increased the rate limits for Chat, Cowork, Claude Code, and the Claude platform to accommodate the higher token consumption in the "higher - effort" mode.

You can view the current level and the exact limits in the Claude Console or find more details in the documentation.

Correction of Assessment Scores (Addendum)

  • Humanity’s Last Exam: Anthropic updated the assessment model for this test and accordingly corrected the score of Sonnet 4.6 to 34.6% (without tools) and 46.8% (with tools). Therefore, these scores differ from the data reported in the Sonnet 4.6 release blog.
  • OSWorld‑Verified: Anthropic optimized the conduct of this test to better reflect the model's performance in real - world scenarios and corrected the score of Sonnet 4.6 to 78.5%. This is the reason for the deviation from the data specified in the Sonnet 4.6 release blog.

Developer Feedback

As soon as Claude Sonnet 5 was released, people started testing it.

The internet user Nicolas Bustamante says that he particularly likes Sonnet 5 because of its speed and optimization for agents. "My favorite example is browser usage: It's fast and secure."

According to the results of the system card, the success rate of prompt - injection attacks in the browser usage situation is only 0.93% for Sonnet 5, 31.5% for Opus 4.8, and 50.7% for Sonnet 4.6.