StartseiteArtikel

Huang Renxuens Prophezeiung ist wahr geworden: KI-Agenten sind zu den Hauptakteuren auf GitHub geworden und schaffen an einem Tag, wofür Menschen ein Jahr benötigen.

新智元2025-08-05 17:49
"Software verschlingt die Welt, aber KI wird die Software verschlingen." - Die Prophezeiung von Huang Renxun, CEO von NVIDIA, wird immer schneller zur Realität.

Recently, a latest study from Queen's University in Canada has revealed for the first time how AI programming agents are infiltrating open - source communities on a large scale.

Paper link: https://arxiv.org/abs/2507.15003

Dataset link: https://huggingface.co/datasets/hao-li/AIDev

Code link: https://github.com/SAILResearch/AI_Teammates_in_SE3

By analyzing 456,000 GitHub Pull Requests (PRs, code modification requests), the research team found that AI programming agents such as OpenAI Codex, GitHub Copilot, and Claude Code have gone beyond the simple role of code completion and are now active on the front - line of open - source development as real 'AI programmers':

They can independently initiate PRs, participate in reviews, and even engage in 'discussions' with human developers about modification plans.

This marks that software engineering has officially entered the 3.0 era predicted by well - known AI scientist Andrej Karpathy—AI has been upgraded from a tool to a collaborative partner. More than 61,000 open - source projects around the world have started to accept AI programming agents as 'colleagues'.

These projects cover various scales, and their users include 47,000 human developers.

Among them, OpenAI Codex is the most active, having submitted 410,000 PRs (reaching 800,000 by the time of publication), earning it the title of 'the most competitive one'; Devin and GitHub Copilot follow closely with 24,000 and 16,000 submissions respectively.

Explosive Efficiency: Completing 3 Years of Work in 3 Days

The efficiency improvement brought by AI programming agents is astonishing. Data shows that GitHub Copilot can complete the core work of a code modification request in an average of only 13 minutes, much faster than the hours or even days usually required by human developers.

An even more extreme case shows that a developer submitted 164 code modifications in just 3 days with the help of OpenAI Codex, almost equivalent to his total work in the past 3 years (176 submissions).

This is like equipping each programmer with 100 tireless interns who can produce code 24 hours a day.

Quality Dilemma: Fast Does Not Equal Good

The research reveals a key contradiction: the acceptance rate of AI - generated code is generally lower than that of human - written code.

The code merge rate of OpenAI Codex is 65%, while that of GitHub Copilot is only 38%, and the average for human developers reaches 76%.

This gap is particularly obvious in core function development: in core function development (feat) and defect repair (fix) tasks, the gap is especially significant (15 - 40 percentage points lower).

However, AI shows unique advantages in documentation writing (docs). The acceptance rate of OpenAI Codex's documentation - related modifications is as high as 88.6%, significantly exceeding the 76.5% of human developers. The research speculates that document generation relies more on language ability than complex logical reasoning, which fits the core advantages of current large - language models.

An even more inspiring phenomenon is that up to 37% of GitHub Copilot PRs have undergone 'human - AI joint review'—AI tools conduct a preliminary screening, and then humans take over.

However, the new model also raises concerns: the research found that the code submitted by Copilot is usually pre - reviewed by its 'fellow' AI agent (copilot - swe - agent[bot]), which has potential review blind spots of 'insiders reviewing insiders'. The research team suggests that in the future, a more independent review mechanism should be explored to ensure fairness.

The Future Is Here: GitHub Evolves into an AI Training Ground

The research predicts that open - source platforms will evolve into 'training gyms' for AI agents. Every successful code merge will become a 'positive feedback' for reinforcement learning, and every test failure or PR rejection will be valuable 'negative feedback'.

The ultimate goal is to cultivate mature AI programmers who can independently and reliably complete software iterations.

Based on a large amount of empirical data, the research team outlines the key development directions for the era of AI programming agents:

1. Dynamic evaluation system: Abandon traditional static testing and directly evaluate AI performance in real - world project environments.

2. Failure mode analysis: Conduct in - depth analysis of rejected PRs and establish a library of common AI errors to drive improvement.

3. Latency optimization: Focus on solving the long - tail problem of response timeouts (> 1 hour) for some tasks.

4. Review burden reduction: Make AI - generated code clearer and easier to review, reducing the burden on humans.

5. Professional review AI: Develop agents specifically for code review.

6. Intelligent review resource allocation: Automatically allocate review resources according to code complexity and risk.

7. Full - cycle quality tracking: Monitor the long - term maintenance cost and defect rate of AI - generated code.

8. Requirement understanding: Improve AI's ability to understand and plan for unclear task intentions.

9. Programming language optimization: Conduct in - depth adaptation for languages that AI is good at, such as TypeScript, or develop new languages specifically for AI.

'This is not about replacing human developers but redefining their core role. In the future, programmers will be more like conductors of a symphony orchestra, focusing on setting strategic goals and coordinating the collaboration of multiple 'AI musicians' rather than playing every note themselves.'

As the number and capabilities of AI programming agents grow exponentially, the software engineering industry is at the critical point of profound change. How this revolution will reshape the development process, team collaboration, and even the industry ecosystem is worthy of our continuous attention and reflection.

Reference materials:

https://arxiv.org/abs/2507.15003

This article is from the WeChat public account "New Intelligence Yuan", author: New Intelligence Yuan, editor: LRST, published by 36Kr with authorization.