Just Now: Anthropic Releases Sonnet 5 with Performance Close to Opus 4.8, Not Necessarily Cheaper

The Sonnet model with the most prominent Agent attributes to date!

Just now, Anthropic officially released a new model, Claude Sonnet 5, which it calls "the most agentic Sonnet model to date." It can make plans, use tools such as browsers and terminals, and operate autonomously at a level that just a few months ago required larger and more expensive models.

Sonnet 5 shows significant performance improvements over Sonnet 4.6 in reasoning, tool use, programming, and knowledge work. It comes closer to Opus 4.8 but at a lower price.

Officially, for developers, the era of AI Agents began with Sonnet-level models: Claude Sonnet 3.5, 3.6, and 3.7 were among the first models to demonstrate outstanding capabilities in programming and tool use. However, recently, the most significant improvements in Agent capabilities have mainly occurred in Opus-level models.

Claude Sonnet 5 significantly narrows this gap: its performance is close to that of Opus 4.8, but at a lower price. Compared to its predecessor, Sonnet 4.6, it shows significant improvements in key dimensions of agent performance, such as reasoning, tool use, programming, and knowledge work. The specific comparison is shown in the following figure:

The following figure compares the performance of Sonnet 5, Sonnet 4.6, and Opus 4.8 in the agent search evaluation BrowseComp and the computer use evaluation OSWorld - Verified under different "effort levels":

Sonnet 5 (orange line) shows clear performance improvements over Sonnet 4.6 (gray line) and offers a wider range of cost - performance options than Opus 4.8 (yellow line).
At a medium effort level, Sonnet 5 significantly improves cost - efficiency; at a higher effort level, its performance can match that of Opus 4.8 in some tasks.
Between Sonnet 5 and Opus 4.8, users can flexibly adjust the effort level according to specific tasks to find the most suitable balance between cost and performance for their needs.

The cost - performance curves at different effort levels are shown in the above figure. The previous best Sonnet model (Sonnet 4.6) was far inferior to Opus 4.8. Sonnet 5 offers a wider range of cost - performance options than Sonnet 4.6 and can reach the capability level of Opus 4.8 in some cases. The pricing of Sonnet 5 shown in the chart is $3 per million tokens for input and $15 per million tokens for output. With the early - access price until August 31st ($2 per million tokens for input and $10 per million tokens for output), the actual cost of Sonnet 5 is even lower than that shown in the chart. The pricing of Opus 4.8 is $5 per million tokens for input and $25 per million tokens for output.

Feedback from Anthropic's early - access partners has been consistent: Sonnet 5 is more agentic than its predecessor models. Testers describe that it can complete complex tasks where previous Sonnet models would stop midway; it actively checks its own output without explicit prompts; and it accomplishes all this agent work at an attractive price:

Security Assessment

Anthropic's pre - deployment security assessment found that Sonnet 5 is generally improved compared to Sonnet 4.6. In terms of autonomous agent security, the model performs better in rejecting malicious requests and resisting hijacking attempts in prompt injection attacks. The model's hallucination rate and obsequious behavior rate are both lower than those of Sonnet 4.6. In the automated behavior audit (testing a wide range of inappropriate behaviors, such as assisting in abuse and deception), Sonnet 5 scores lower (i.e., is safer).

However, compared to the more capable Opus 4.8 and Claude Mythos Preview, it does show a slightly higher rate of inappropriate behavior in this assessment.

The above figure shows the rate of inappropriate behavior in the automated behavior audit, which tests a large number of bad behaviors in various scenarios and contexts (see Section 6.4 of the Sonnet 5 system card for the full list and results of each behavior). The overall rate of inappropriate behavior of Sonnet 5 is lower than that of Sonnet 4.6 but higher than that of Mythos Preview and Opus 4.8.

Anthropic states that they did not specifically train Sonnet 5 for cybersecurity tasks. It can perform some routine and harmless network tasks, but when evaluating potentially dangerous network skills (such as developing software vulnerability exploits), its performance is significantly inferior to models like Opus 4.8 and Mythos 5.

The following figure shows the scores of one such assessment, which tested the model's ability to develop exploits for Firefox browser vulnerabilities. Sonnet 5 never developed a fully usable exploit, but its partial success rate was slightly higher than that of Sonnet 4.6. The improvement in the latter may be due to the improvement of general intelligence rather than specific training.

The above figure shows the scores of models for successfully developing exploits for software vulnerabilities in Firefox 147 (this assessment was developed in cooperation with Mozilla; all vulnerabilities have been fixed in Firefox 148). For each model, the left bar represents the frequency of the model (without security guards) developing a usable exploit, and the right bar represents the frequency of partial success. Neither of the two Sonnet models successfully developed a usable exploit (both scored 0.0%); Sonnet 5's partial success rate was slightly higher than that of Sonnet 4.6. The network capabilities of both Sonnet models are significantly weaker than those of Opus 4.8 and Mythos 5.

Since Sonnet 5 is slightly more capable than its predecessor in these tasks, Anthropic has enabled network security guards by default. These guards, which can detect and block dangerous network use in real - time, are the same as those in Claude Opus 4.7 and 4.8 (because Anthropic judges that the overall network security risk of Sonnet 5 is low, and the strictness of its guards is lower than that of Fable 5, which blocks a wider range of network security tasks).

For the full assessment report of Sonnet 5 in multiple security and capability evaluations, please refer to the "Claude Sonnet 5 System Card."

Pricing

As of today, Claude Sonnet 5 is officially available on all channels. To celebrate the release, Anthropic is offering a limited - time introductory price:

From now until August 31, 2026: $2 per million tokens for input and $10 per million tokens for output
After that, the standard pricing will resume: $3 per million tokens for input and $15 per million tokens for output

Meanwhile, they announced a comprehensive increase in the rate limits of Chat, Cowork, Claude Code, and the Claude platform to accommodate the larger token consumption brought by the higher "effort level" mode.

Notes

Cybersecurity Verification

Sonnet 5 has been included in Anthropic's "Cybersecurity Verification Program." This program is now available on the following platforms:

Claude's native platform
Claude platform on AWS
Claude in Microsoft Foundry (hosted on Azure and Anthropic)

Claude on Google Vertex will also support it soon.

Organizations that have joined the program will automatically gain equal access to Sonnet 5 without the need to re - apply. If your cybersecurity work requires fewer security guard restrictions, Anthropic recommends using Claude Opus 4.8.

Tokenizer Update and Pricing Explanation

Sonnet 5 is an upgraded version of Sonnet 4.6 but uses a new tokenizer to optimize text processing performance (similar to the tokenizer change introduced in Claude Opus 4.7).

The change is that the same input content will now be mapped to more tokens, with a specific increase of about 1.0 - 1.35 times, depending on the content type.

For this reason, the early - access price set by Anthropic is to ensure that the overall usage cost remains roughly the same when users transition to Sonnet 5.

Rate Limit Adjustment Explanation

As early as April 26, 2026, Anthropic increased the rate limits for Sonnet and Haiku models at all usage levels and simplified the packages on the native Claude platform into three levels: Start, Build, Scale.

In this update, Anthropic further increased the rate limits of Chat, Cowork, Claude Code, and the Claude platform to match the larger token consumption brought by the higher "effort level" mode.

You can view the current level and specific limits in the Claude Console or refer to the documentation for more details.

Explanation of Evaluation Score Correction (Supplement)

Humanity’s Last Exam: Anthropic updated the scoring model for this evaluation and corrected the scores of Sonnet 4.6 to 34.6% (without tools) and 46.8% (with tools). Therefore, this score is different from the data reported in the Sonnet 4.6 release blog, and this is hereby explained.
OSWorld - Verified: Anthropic optimized the operation method of this evaluation to more realistically reflect the model's performance in actual scenarios and corrected the score of Sonnet 4.6 to 78.5%. This is also the reason why this score is inconsistent with the data in the Sonnet 4.6 release blog.

Feedback from Developers

As soon as Claude Sonnet 5 was released, people started evaluating it.

Netizen Nicolas Bustamante said that one of the things he likes about Sonnet 5 is that it is fast and optimized for Agents. "My favorite example is browser use: it's fast and secure."

According to the system card results, in the browser use scenario, the success rate of prompt injection attacks for Sonnet 5 is only 0.93%, while that for Opus 4.8 is 31.5% and for Sonnet 4.6 is 50.7%.

However, some netizens said, "It's too expensive."

According to the analysis of Artificial Analysis, on the Intelligence Index, the operating cost of Claude Sonnet 5 is $2.29 per task, which is about twice as much as that of Sonnet 4.6 and about 15% higher than that of Claude Opus 4.8. This increase in cost is entirely driven by the increase in token usage, making Claude Sonnet 5 one of the models with the highest operating costs, second only to Claude Fable 5.

So, what do you think of the new model? Welcome to leave comments and communicate in the comment section!

Reference Links:

https://x.com/claudeai/status/2072017450611142835

https://www.anthropic.com/news/claude-sonnet-5

https://x.com/ArtificialAnlys/status/2072062595482456431

This article is from the WeChat official account

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Anthropic released Sonnet 5, whose performance is close to Opus 4.8, but it is not necessarily cheaper