HomeArticle

Just now, Anthropic completely disrupted a $50 billion industry late at night. The end of code auditing has arrived.

新智元2026-03-10 21:26
A nightmare in the $50 billion market

Just now, there was an earthquake in the global AI circle: A new feature from Anthropic has directly disrupted the traditional code auditing industry worth $50 billion! Traditional security vendors charging an annual fee of $50,000 are now wiped out at once. The new tool costs as little as $15.

Just now, Anthropic has made another move!

The father of Claude Code officially announced a major update: Claude Code has added a new code review function.

This time, it is targeting an industry worth $50 billion - code security auditing.

The new feature just released by Anthropic is challenging the entire code security industry in an extremely simple and straightforward way.

Some people exclaimed: The industry worth $50 billion has been overturned by Anthropic overnight!

Now, we can just wait for the security stocks to plummet.

At Anthropic, almost every PR has tested this system.

After months of testing, the results are as follows:

The proportion of PRs with substantial review comments has increased from 16% to 54%.

The proportion of review results considered incorrect by engineers is less than 1%.

In large Pull Requests (over 1000 lines), 84% of PRs have superficial issues, with an average of 7.5 issues per PR.

Currently, this feature has been launched as a research preview in the Claude Team and Enterprise beta versions.

A nightmare for the $50 - billion market

Anthropic's product has caused an earthquake in the global AI circle and the network security industry (AppSec) that will go down in history.

Senior developers are exclaiming that the code auditing industry worth $50 billion has been uprooted!

This is because, in the past, large companies had to pay traditional security vendors (such as Snyk, Checkmarx, etc.) up to $50,000 or even higher in licensing fees every year to hire professional teams for scanning and auditing in order to prevent bugs or security vulnerabilities in the code from flowing into the production environment.

Now, Claude can directly send a team of AI agents to lurk in your PR, on standby 24/7.

Moreover, calculated by token, the average cost of a single review is only $15 - $25!

The difference between $50,000 and $25 is 2000 times.

This is not just a feature update; it's the clarion call for the end of traditional code auditing.

Code Review, the most painful part for developers

If you ask any engineering team: What is the biggest bottleneck in software development?

I believe many people will answer code review.

In the past few years, the ability of AI to write code has advanced by leaps and bounds. Whether it's GitHub Copilot, Cursor, Claude Code, or ChatGPT, developers using these tools have seen a direct surge in the amount of code they write.

As a result, a problem has emerged - although the code is being produced at a rapid pace, the number of people reviewing the code has not increased.

Anthropic found that in the past year, the code output of each engineer has increased by 200%, but many PRs (Pull Requests) have only been quickly glanced at.

Even developers themselves admit that many code reviews are just going through the motions.

As a result, a large number of bugs, vulnerabilities, and logical issues have been brought into the production environment.

This is why many enterprises are willing to spend a fortune on security scanning tools.

However, a problem arises - these tools are not smart.

What exactly are the problems with traditional code scanning tools?

If you have used traditional AppSec tools, such as Snyk, Checkmarx, Veracode, SonarQube, etc., you will probably have this feeling: there are too many false alarms.

The reason is that most of these tools are based on static rules and known vulnerability libraries. They can scan the code but cannot truly understand it.

A common scenario is that the tool alerts that "there may be a SQL injection risk," but after developers check for a long time, they find that there is no problem.

So, people gradually start to ignore the warnings, and real and dangerous problems are often overlooked.

Therefore, enterprises still need a large amount of manual code review, and what Anthropic has done this time is to automate it.

Anthropic throws out an AI code review army

This time, the idea behind Claude Code Review is actually very simple.

In Claude Code, the system can automatically analyze Pull Requests and conduct checks from multiple perspectives, such as:

Whether the code complies with project rules

Whether there are potential bugs

Whether the changes conflict with the historical code logic

Whether the problems raised in previous PRs reappear

Finally, it will output two results: a high - signal summary comment and an inline comment indicating the specific code location.

That is to say, when you open a PR, you can see an AI review report and the truly important issues, rather than dozens of pages of a detailed record.

The era of "AI writing code and AI reviewing code" has finally arrived.

The signs of Claude's self - cycling and self - recursion have emerged.

As AI capabilities become increasingly powerful, the only role humans may play in the future is to turn on the AI switch, and there will only need to be a Claude button on the keyboard.

Multi - Agent system: The Claude Code review army is on the move

The biggest feature of Claude Code Review is that it is not a single AI but a team.

When a PR is created, the system will automatically launch a team of AI agents.

It is reported that Claude's new code review function will dispatch multiple AI "review agents" to work in parallel, with each agent responsible for different types of checks.

These agents filter false alarms through verification and sort errors according to their severity. The final results will be presented on the PR as a high - signal comprehensive comment and inline comments for specific errors.

The scale of the review will be adjusted according to the size of the PR.

Large or complex changes will receive more agents and a more in - depth review; minor changes will pass quickly. According to Anthropic's tests, the average review time is about 20 minutes.

Finally, through mutual verification among multiple agents, false alarms can be reduced.

In this process, it will focus on finding logical errors, security vulnerabilities, edge - case defects, and hidden regression issues.

All discovered issues will be marked by severity.

Red dots indicate common issues, i.e., bugs that should be fixed before merging the code;

Yellow dots indicate minor issues, which are recommended to be fixed but will not prevent merging;

Purple dots indicate existing issues, bugs not introduced by this PR.

Each review comment also includes a collapsible extended reasoning.

When expanded, you can see:

Why Claude marked the issue

How it verified that the issue actually exists

It should be noted that these comments will not automatically approve or block PR merging, so they will not disrupt the existing code review process.

By default, Claude Code Review mainly focuses on code correctness.

That is to say, it focuses on checking:

Bugs that will cause production environment failures

Actual logical issues

It will not focus on issues such as code format, style preferences, or lack of tests.

If users want to expand the scope of the check, they need to configure it.

The internal test results are terrifying

The internal test results of Anthropic are terrifying! They also further prove that traditional code review is basically a joke.

The internal data is really shocking: only 16% of PRs received substantial review comments.

In large PRs over 1000 lines, 84% of the code had issues identified, with an average of 7.5 bugs caught per PR.

Why? The reason is that engineers are too busy.

In the past year at Anthropic, each engineer's code output has increased by 200%. With more and more code, who has the time to read it line by line?

After implementing this function, the proportion of PRs in the codebase with substantial repair suggestions has skyrocketed from 16% to 54%.

This means that nearly 40% of potential "shit - mountain" code used to slip through the eyes of human programmers, but now, all of it has been caught by Claude.

What's even more terrifying is for small PRs of less than 50 lines. Previously, people thought that with just a few lines, there couldn't be any problems.

As a result, 31% of them were found to have issues. One out of every three small changes hides a bug.

Engineers' recognition rate of the issues caught is over 99%! Less than 1% of the results were marked as false alarms by engineers.

This accuracy rate has exceeded that of