Das neue Claude-Modell 4.6 ist da, und noch mehr Arbeitsplätze sind verloren: Finanzexperten auf Wall Street, Compilerentwickler, White-Hat-Hacker, PPT-Designer... alle sind aufgeben mussten.
As soon as one opens one's eyes, Anthropic has presented a new model. Let Claude Opus 4.6 congratulate you! Happy new year!
As soon as the news spread, the financial data provider FactSet plunged by as much as 10% during the trading session. S&P Global, Moody's, and Nasdaq Inc. all fell, and all major indices tumbled.
This is already the second time this week that you, Anthropic, have turned the markets upside down.
A few days ago, it quietly introduced a plugin for automated legal work, which directly led to a plunge in billion - dollar software stocks.
Investors' panic centers around one question: Who can guarantee that they won't be replaced by AI in the next few years? Those who can't sell.
Who would have thought that Anthropic would strike even harder today.
Until now, the perception of Claude was that it was far and away the strongest in programming.
Claude Opus 4.6 scoffs and shatters this perception in one fell swoop: I'm strong in many more areas!
At least according to the official statement, Claude Opus 4.6 can excel at financial analysis, research, and the Office trifecta.
It says directly on the official website:
In the GDPval - AA (a performance indicator for evaluating economic tasks in the fields of finance, law, and other knowledge - work levels) Opus 4.6 is 144 Elo points ahead of the next - best industry model, OpenAI GPT - 5.2~
(This means that Claude Opus 4.6 scores higher than GPT - 5.2 in about 70% of the cases in this evaluation. 50% would mean the scores are comparable)
Of course, it is also still unbeatable in programming.
It achieved the highest scores in the Agent - Programming Evaluation Terminal - Bench 2.0 and also led all other top models in the "last test for humans".
The good news is that the performance is improved without increasing the prices. The prices for Opus 4.6 remain the same as before: $5 for one million input tokens and $25 for one million output tokens.
(For better readability, the new model will be abbreviated as Opus 4.6 hereinafter)
Back on top with a 1M context and adaptive thinking abilities
The most obvious improvement in Opus 4.6 is that it now has an enormous context window of 1M tokens. This is the first time that Claude has introduced a context window of this length in an Opus - level model.
This significantly improves the problem of "context degradation" that occurred when processing long texts with Opus 4.6.
In the MRCR v2 8 - needle 1M Benchmark test - that is, the "haystack search game" - Opus 4.6 achieved 76% of the score, while Claude Sonnet 4.5 only achieved 18.5%.
This results in an improvement in search ability.
In the BrowseComp test (which measures the ability to conduct online research on hard - to - access information), Opus 4.6 ranked first in the industry. Its performance in in - depth, multi - step agent searches was the best, and it can precisely locate key information in long documents.
Opus 4.6 also introduces the function of adaptive thinking.
Previously, developers using the Claude model had to choose between two options: either turn on the advanced thinking mode or turn it off.
Now Claude can decide for itself when in - depth analysis is required.
(To be honest, this is behind ChatGPT. Next time, please be quicker with such good features!)
The associated "effort" parameter offers four levels - low, medium, high, max - and is set to "high" by default. If the model thinks about a task for too long, you can manually lower the level.
Another useful function is context compaction.
When the conversation approaches the upper limit of the context window, a summary is automatically created, and the old content is replaced, making long conversations and agent tasks easier.
Cracks core areas such as programming, knowledge work, search, and analysis
According to the official blog, hardly any other model can keep up with Opus 4.6.
Opus 4.6 has made significant improvements in core areas such as programming, knowledge work, search, and analysis.
It has outperformed its predecessor and competitor models in several tests, as shown here:
After getting a general overview of the model, let's look at the individual aspects.
First, the programming abilities.
Opus 4.6 achieved the highest scores in the Terminal - Bench 2.0.
From the perspective of the actual abilities behind these results, Opus 4.6 can better plan tasks, run stably in large code libraries, and improve the accuracy of code review and debugging.
Moreover, it can detect its own errors.
Opus 4.6 also supports programming in multiple languages and can solve problems in software development in different languages.
It can perform the migration of code libraries with millions of lines of code like an experienced software engineer, and in half the time it usually takes.
As I'm writing this, I can't help but wonder:
Will software engineers be so happy about this news that they'll have less hair loss, or will it be even worse for them? (Thought process.jpg)
Second, Opus 4.6 is also breaking into the area of traditional office work.
This time, it's targeting the Office trifecta.
It can directly import disordered, unstructured data into Excel, find the appropriate table structure on its own, and process multiple complex steps in one go.
It can remember your company's PPT template, including fonts and layout, so that the generated PPTs don't have an AI feel, and the boss will think you stayed up all night creating them.
In a coworking environment, Opus 4.6 can perform multiple tasks in parallel for the user, such as conducting a financial analysis and summarizing the research results in a document at the same time.
Does Anthropic intend to bring Claude out of the chat window into other areas?
Third, let's talk about the improvement of its analysis abilities.
First, a summary:
Opus 4.6 has developed even more capabilities in interdisciplinary analysis.
In the test for complex analysis in different fields, the "last test for humans", Opus led all top models.
In the legal field, Opus 4.6 achieved 90.2% on the BigLaw - Bench, where 40% is the maximum score.
In the GDPval - AA, an evaluation of tasks focused on economic benefits in finance and law, Opus 4.6 outperformed the "industry - competitor model" OpenAI GPT - 5.2 by 144 Elo.
Whether it's complex legal or financial expertise or demanding academic research, the depth of its analysis and understanding has reached the level of current top models.
It's remarkable that this intellectual leap doesn't come at the expense of security.
In the automated behavior test that Anthropic values highly, Opus 4.6 has a very high degree of consistency, and at the same time, negative behaviors such as deception and flattery are very low.
Opus 4.6 has even solved the problem of "excessive rejection", which is currently widespread in the AI world -
Compared with all previous models, it rejects normal, harmless requests much less often.
Currently, Opus 4.6 is available on the official website, via the API, and on all major cloud platforms.
The performance is improved without increasing the prices. The prices for Opus 4.6 remain the same as before: $5 for one million input tokens and $25 for one million output tokens.
However, for the test model with a 10M token context, additional costs are incurred if the input prompt contains more than 200k tokens.
IMPORTANT!
To use Opus 4.6, the model identifier "Claude - opus - 4 - 6" must be explicitly specified when calling the API.