AI is so good at writing code that humans can no longer keep up with reviewing it.
At three o'clock in the morning, the code repository of a fintech company is still being updated.
The engineers aren't working overtime all night; it's the AI that's still hard at work.
Since the team fully integrated AI programming tools, the average monthly code output of this company has soared from 25,000 lines in the past to 250,000 lines. In just a few months, there are over one million lines of code in the repository that haven't been reviewed yet. The New York Times called this phenomenon "The Big Bang of Code": the speed of code generation has far exceeded humans' ability to digest it.
It takes five minutes to generate 1,000 lines of code, but it takes about forty minutes to barely finish reviewing them.
Writing code has become the easiest part for the first time. The real bottleneck has become understanding and reviewing.
The industry is, of course, trying to fill this gap. Represented by Anthropic's Opus 4.8 and the higher - level but restricted - access Mythos/Fable series, they are no longer satisfied with writing code faster but have started to strengthen code understanding, cross - file reasoning, and review capabilities: tracking variable flows, identifying potential vulnerabilities, and giving context - aware modification suggestions. They are being given new roles, from assistants to programmers to assistants to reviewers.
But this doesn't make the problem disappear. The AI is responsible for generation, and it also starts to participate in the review; the code production capacity continues to expand, but understanding and responsibility haven't kept up.
When tools like Claude Code and Cursor turn the "dialog box" into the main battlefield, engineers are becoming more like "prompt dispatchers" rather than just traditional programmers. The code flood is coming, but who will be responsible for the quality? Who will be responsible for the vulnerabilities? Who will truly understand the structure of this system?
The more pointed question is: Is human review an inefficient bottleneck in the AI era, or is it the last line of defense that can't be removed?
The Code Flood: Why Does AI Produce So Much Code?
Why is the speed of AI outputting code far beyond the upper limit of human review ability? This is not caused by a single reason.
AI is good at generating new code but not at reusing old code.
A 2023 study by code analysis company GitClear found that with the popularity of AI programming tools like GitHub Copilot and Cursor, the code duplication rate (i.e., "cloned code") has risen from about 3.3% in 2020 to 7.1%. The report pointed out that AI is more inclined to "add new code blocks" rather than suggest deleting, refactoring, or moving existing code.
When developers ask to implement a functional component, the AI may generate five seemingly similar files in different parts of the project instead of suggesting that developers reuse or refactor an existing one.
The AI is like an assistant who always adds furniture to a room. The sofa is broken? Get you a new one. The wall is out of date? Paint another layer. As for whether the space is crowded, it's not responsible.
More importantly, developers' behavior has also changed.
It takes an average of five minutes to understand an existing component, but it only takes ten seconds to let the AI generate a new component with a similar function.
The cost gap is obvious.
So more and more developers choose to "regenerate" rather than "reuse and understand", quickly producing modules but reducing architectural thinking.
When the cost of understanding is higher than the cost of generation, code starts to reproduce uncontrollably.
However, the improvement in speed often comes at the cost of sacrificing the rigor of design and the clarity of the architecture, leaving hidden dangers for future maintenance.
At the same time, AI tools are reshaping the programming interface. Take the "agent mode" of Claude Code and Cursor as an example. In the Agent mode, the compiler window is almost weakened. In the past, developers wrote, debugged, and refactored code in the IDE; now, they spend more time "chatting" with the model.
For a complex Excel report? It used to take two hours to check the documentation. Now, with a one - sentence requirement, a complete script can be generated in a dozen seconds.
However, the other side of convenience is the failure of "brain filtering". Developers no longer need to think deeply about details. To ensure the "completeness" of logic, the AI often adds a large amount of defensive code, boundary checks, and even over - engineered abstraction layers. As a result, the generated code can run but is long, complex, like an over - explained instruction manual, greatly increasing the cost of reading and review.
More extremely, it's the promotion at the organizational level.
There was a competition called "tokenmaxxing" within tech companies like Meta, which encouraged engineers to compete on who could use the fewest prompts to drive the AI to generate the most lines of code. The outdated and much - criticized indicator of Lines of Code (LOC) has been unexpectedly re - given the meaning of "core KPI" in the AI era.
Andrew Bosworth, the Chief Technology Officer of Meta, optimistically wrote in an internal memo: "Projects that used to require hundreds of engineers can now be completed by dozens of people. Work that used to take months can now be done in a few days." This expectation has spawned the anxiety of "humans are inferior to AI" and also driven teams to pursue the "numerical prosperity" of code output rather than the internal quality. The result is that a large amount of code that hasn't been fully thought through and designed is quickly submitted, flooding into the already fragile review channel.
There is a huge gap between the "correctness" and "elegance" of the code generated by AI. To meet a complex requirement, the AI may generate multi - level nested callback functions or use obscure library features.
Different developers or the same developer at different times may generate code with very different styles and structures due to slight differences in prompts. This inconsistency makes subsequent code review like reading an anthology co - written by multiple people without a unified writing style, and the cost of understanding increases sharply.
The more fundamental challenge is that AI doesn't really understand the business context, the long - term evolution goals of the system, and the cost of technical debt. It generates a code snippet that "seems correct at the moment" rather than code that is "maintainable and evolvable in the overall system".
What Will Happen If the Code Can't Be Reviewed?
The sharp increase in code output hasn't brought the expected leap in efficiency. Instead, it has triggered a series of negative reactions, dragging developers into deeper fatigue and the quagmire of technical debt.
Are we using faster machines to create slower processes?
The AI can generate thousands of lines of code in a few minutes, but manual review takes dozens of minutes or even longer.
A 2023 report by security company Snyk pointed out that about 25% of the code generated by AI contains confirmed security vulnerabilities, a significantly higher proportion than the average of human - written code.
The result is a strange role reversal. More and more developers admit that the time they spend reviewing, debugging, and modifying AI - generated code now exceeds the time they used to spend writing code themselves.
Reviewing a large amount of AI - generated code is a high - intensity, high - density mental labor. Developers need to constantly judge: Is the logic of this code comprehensive? Are there any hidden errors? Does it conflict with other parts of the system? Does it introduce security risks?
The Massachusetts Institute of Technology Technology Review and other media have pointed out that continuous and high - intensity AI code review is leading to widespread professional burnout, cognitive fatigue, and psychological pressure among developers worldwide. Their brains are in a "defensive" review mode for a long time rather than a "creative" construction mode, and their enthusiasm for innovation and job satisfaction are being eroded.
Once code that hasn't been fully reviewed enters the code repository, it becomes "technical debt" for the future. The code generated by AI, due to its tendency to be redundant, highly coupled, and low in readability, is often a "high - interest debt" itself. The accumulation of this code makes the system architecture increasingly corrupt. It can run in the short term but is difficult to maintain in the long term. The technical debt starts to snowball, forming a vicious cycle of "the more code, the worse the quality, the harder the modification, the more new code". Eventually, the entire system may become impossible for anyone to fully understand, and every change is like walking on thin ice.
The wave of AI - generated code is also strongly impacting the collaborative ethics of the open - source world.
Many well - known open - source project maintainers are overwhelmed.
In 2023, Daniel Stenberg, the founder of cURL, closed the six - year - running bug bounty program because he was unable to handle the flood of low - quality bug reports and patches generated by AI.
Mitchell Hashimoto, the creator of another open - source tool, Ghostty, directly prohibited all contributions generated by AI and introduced a "guarantor" system based on trust. The core of open - source - open collaboration and the sharing of wisdom - is facing a severe challenge due to the undifferentiated and low - quality contributions of AI.
Is the Answer Also in AI? When the Problem Is Caused by AI, the Industry Naturally Wonders: Can AI Solve It?
We asked this question to a senior data operator at ByteDance, an engineer expert at Didi, and a software engineer at a US startup. They all said that they have been using AI to write code extensively at work and believe that AI writing code + AI review is a definite trend in the future.
They said that there are some difficulties in manually checking AI - generated code at present. For example, the code volume is large, and it takes a lot of time for humans to understand the logic and writing style; there are security risks, and the data structure may be leaked; sometimes there are problems with logical consistency, which need to be manually modified.
Leading tech companies are actively deploying.
In December 2025, the star AI programming tool Cursor acquired the code review robot startup Graphite, aiming to help engineers prioritize the most sensitive and high - risk code review requests.
In China, large companies are also practicing. For example, Alibaba's Tongyi Lingma AI programming assistant has been integrated into the daily work of tens of thousands of developers. Official data shows that more than half of the effective code review comments are now automatically generated by AI every day. Against the background of a slight decrease in manual review volume, the overall effective review volume (including AI comments) has doubled year - on - year. Alibaba even jointly open - sourced the industry's first multi - language CodeReview Benchmark with repository context awareness with universities, trying to set industry standards for AI code review.
Startups have also seen the huge opportunity. Companies like Qodo focus on building a full - process platform of "AI code generation -> risk discovery -> automatic review -> governance and repair" and have received huge financing, indicating that the code quality management market is about to explode.
Among all the attempts, the most eye - catching and controversial one is Anthropic's Project Glasswing. This project initially revolved around the Claude Mythos Preview, bringing together key players in the tech and open - source ecosystems such as AWS, Apple, Google, Microsoft, Cisco, and the Linux Foundation. Later, it expanded to about 150 new institutions, covering key infrastructure areas such as power, water, healthcare, communications, and hardware.
Its goal is not only to let AI find and fix vulnerabilities but also to pre - practice a question: When a powerful AI model is capable of large - scale discovery of software defects, how should the cybersecurity industry verify, disclose, and patch these suddenly emerging vulnerabilities?
The capabilities of the Mythos Preview are already enough to make the industry nervous. Anthropic said that in the weeks after the launch of Project Glasswing, it and about 50 partners used the Mythos Preview to discover more than 10,000 high - risk or severe - level software vulnerabilities. The case of Mozilla is more intuitive: After the Firefox team integrated the Claude Mythos Preview into the security screening process, they fixed 271 vulnerabilities in Firefox version 150 discovered by this round of evaluation, many of which were problems that used to require top - level security researchers to reason for a long time to find.
But the problem has come back. Anthropic initially didn't open the Mythos Preview to ordinary users, citing its too - strong cybersecurity capabilities that could be misused.
In June, Anthropic tried to advance on two fronts: on one hand, it launched the protected Fable 5 for a wider range of users; on the other hand, it launched the less - restricted Mythos 5 for a few security teams. But a few days later, the US government asked to suspend foreign access to Fable 5 and Mythos 5 on the grounds of national security and export control, and Anthropic immediately closed all customers' access to these two models.
This makes Mythos a very typical contradiction in the AI era: it may be a powerful tool for patching software vulnerabilities, but it may also be used to discover and exploit vulnerabilities faster.
AI is both a spear and a shield. The tool most likely to alleviate the security problems of AI - generated code may itself become a new source of security risks.
So, is human review a bottleneck in software development in the AI era?
Maybe. But it is also the last line of defense at present.
AI is supposed to improve productivity and free humans from repetitive labor. However, at the current stage, it is increasing the workload rather than reducing it. Is AI moving too fast, or are humans not fast enough? Maybe this is a question that time needs to answer.
This article is from the WeChat official account "Zimu AI", author: Xiaojinya. Republished by 36Kr with permission.