AI produces one billion lines of code per month, a surge of 76%. Programmer forums are in an uproar: Lines of code ≠ Productivity
Curious about how Silicon Valley programmers use AI for coding? Greptile, an AI code review agent used by 2,000 companies, released an annual report on AI programming based on the billions of lines of code audited by AI each month. The report reveals the productivity boost brought by AI programming, yet programmers don't seem to feel it.
The most shocking aspect of this report is that it points out that with the help of AI programming, the code production of engineers has skyrocketed.
The number of code lines submitted by each developer per month has increased from 4,450 to 7,839, a growth rate of 76%. For medium - sized development teams of 6 - 15 people, the code volume submitted by each developer has nearly doubled (an 89% increase), which means that AI programming tools are becoming an efficiency multiplier.
More notably, when programmers submit code at one time, the median number of changed code lines per file has increased by 20% (from 18 to 22 lines), indicating that code iterations are not only "faster" but also "more variable". This may reflect that the code that AI programming tools can modify and the requirements they can handle are becoming more complex.
However, regarding the efficiency improvement mentioned in the report, most of the discussions on the ycombinator forum about this report are skeptical. Some people say that it takes a lot of time to fix the problems in the code generated by AI.
These subtle differences are never captured by such indicators. More people are discussing whether an increase in the number of submitted code lines is equivalent to a real improvement in programmers' work efficiency.
A novice programmer may need dozens of lines of code to complete a function, while a senior programmer can achieve the same with just a few lines. In addition, with the introduction of AI programming, what is the frequency of code deletion and rewriting? This may be difficult to count, but it can well reflect the improvement in work efficiency brought by AI programming.
Another view on the increase in code submission quantity and work efficiency improvement is that assuming employees have the same professional ability, then productivity depends on the output of code lines. But in fact, some tasks are difficult but don't require many lines of code and can only be completed by senior programmers; while some tasks are simple but require many lines of code. Only looking at the code submission volume treats all tasks as medium - difficulty tasks.
In addition, the quality of code submitted by different programmers varies, which is not reflected in this report. From this perspective, each line of code should be regarded as a burden rather than an asset. The development team needs domain experts to determine how many lines of code are actually needed.
Just as you might measure the productivity of warehouse employees by the number of items they move per hour. But if someone just throws things randomly into the warehouse or moves things that don't need to be moved, they will maximize this indicator.
With AI assistance, each programmer can generate more code, but are these codes really necessary to complete the corresponding tasks? This is not a problem that the business side should consider. Only measuring the number of submitted code lines may encourage unnecessary repetitive work.
From this perspective, perhaps the "number of edited lines" is a more appropriate indicator to evaluate programmers' work efficiency. In this way, reducing the size of the codebase through refactoring can still be considered productive. One point is given for each deleted line of code, and one point is also given for each added line of code.
OpenAI Still Leads, but the Gap is Narrowing
Behind the leap in efficiency is the fierce reconstruction of the supporting technology stack. The report uses the SDK download volume of different large - model providers as the research variable and finds that in the AI memory module, mem0 leads with a 59% market share; while in the "six - strong melee" of vector databases (Weaviate leads with 25%, closely followed by Chroma/Pinecone/Qdrant, etc.).
In the LLMOps layer, the download volume of LiteLLM has increased fourfold to 41 million, and LangSmith has risen through the bundling of the LangChain ecosystem. This confirms a trend that model scheduling, monitoring, and degradation have changed from "optional" to "standard infrastructure".
As the number of intelligent agents called in programming increases, the complexity of operation and maintenance rises exponentially. LLMOps is taking on the role that K8s played for microservices back then.
Regarding the arms race among models, the report examines the SDK download volume of model providers from January 2022 to November 2025. The main players are OpenAI, Anthropic, and Google GenAI. OpenAI dominates the market with a steeply rising green curve. Its download volume has soared from almost zero at the beginning of 2022 to 130 million in November 2025, establishing an absolute leading position in the market.
The growth trajectory of Anthropic (the red broken line) can be described as "rocket - like".
Although it started late and had a small base, since the second half of 2023, its download volume has exploded exponentially, reaching 43 million in November 2025, an astonishing 1,547 - fold increase since April 2023. The ratio of OpenAI to Anthropic has shrunk from 47:1 to 4.2:1 - developers are voting with their feet and migrating to more open, controllable, and programmable interfaces.
The yellow curve represents Google, and its growth is relatively gentle. The download volume in November 2025 is about 13.6 million, showing a significant gap compared with the previous two.
The Parameters of Different Models Determine Their Applicable Scenarios
This report also reveals the measured benchmarks of five mainstream models as the back - end of coding intelligent agents (the evaluation indicators include the waiting time for the first token to appear, throughput, cost, etc.). See the table below.
From this table, it can be seen that Claude Sonnet 4.5 and Opus 4.5 only need to wait for less than 2.5 seconds to return the first token, which is significantly better than the GPT - 5 series (> 5 seconds). In interactive programming, 2 seconds is the critical threshold between "flow" and "distraction".
In the scenario of batch generation, GPT - 5 - Codex and GPT - 5.1 have a huge lead in throughput, making them suitable for large - scale code generation/test case filling in the background CI/CD pipeline.
Gemini 3 Pro has a significantly slower response speed, taking more than 10 seconds to return the first token, and the number of tokens output per second is also too small, so it is not suitable for the interactive programming scenario.
The last part of the report also lists the key papers in the field of basic models and large - model programming applications in 2025. These studies indicate the direction of the next wave of breakthroughs. For example, Self - MoA subverts the traditional multi - model integration, proving that multiple samplings and aggregations of a single model can surpass the hybrid of heterogeneous models, which means that "model diversity" may give way to "inference path diversity". Search - R1 uses reinforcement learning to train the model to "independently decide when to search", turning the search engine into a learnable environmental action rather than a static tool call. RetroLM retrieves directly at the KV level, bypassing the original text and changing the way large models organize memory.
No matter how much AI - assisted programming is used, manual review is still required before submitting code. Tracking the usage data of AI programming tools cannot cover the part of manual review, which will make it difficult to truly reflect the actual usage experience and effects of the product. However, if you can prove that AI programming tools help to release features faster, rather than just allowing more lines of code to pass the review, then the AI programming tool you develop will have stronger provable value.
References:
https://www.greptile.com/state-of-ai-coding-2025
https://news.ycombinator.com/item?id=46301886
This article is from the WeChat public account "New Intelligence Yuan". Author: peter Dong LRST. Republished by 36Kr with permission.