Open - Source Framework Enables AI to Learn from GitHub Code, Boosts Bug Fix Rate to 69.8% and Sets Performance Record

Sure enough, good programmers should learn to search for others' experiences online.

When human programmers encounter tricky bugs, they usually search the Internet for the experiences of predecessors.

Although current AI has begun to have the ability to search the Internet, it still cannot effectively acquire the ability to fix bugs from online experiences.

Allowing AI to learn the workflow of human programmers may help improve its bug - fixing ability. The team of the project named MemGovern has recently achieved good results through such an attempt.

In the field of automated software engineering (SWE), large - language - model - driven code agents have brought about a revolution in the programming paradigm. However, they generally face the cognitive limitation of the "closed world" at present: Existing agents often try to fix bugs from scratch or only rely on the local context within the repository, ignoring the vast historical human experiences accumulated on platforms such as GitHub.

In fact, when human engineers solve complex problems, they often search open - source communities and draw on historical solutions to similar problems.

However, it is extremely challenging to directly enable agents to utilize these "open - world" experiences because real - world Issue and Pull Request (PR) data are filled with unstructured social noise, ambiguous descriptions, and fragmented information.

To break through this barrier, the cutting - edge open - source academic community QuantaAlpha has joined forces with teams such as University of Chinese Academy of Sciences (UCAS), National University of Singapore (NUS), Peking University (PKU), and East China Normal University (ECNU) to propose the MemGovern framework.

This framework does not adopt the simple retrieval - augmented generation (RAG) approach. Instead, it proposes a complete "experience refinement" mechanism to transform the cluttered GitHub data into structured memories friendly to agents. Combining the idea of Deep Research, it also proposes the "Experiential Memory Search" strategy to achieve a closed - loop of extracting reusable repair logic from historical experiences.

Core Pain Point: Massive Data ≠ Usable Knowledge

Existing Code Agents (such as SWE - Agent) often find themselves in a state of "not knowing what to do" when dealing with complex bugs because they lack historical memories. Although GitHub is a huge treasure trove, directly feeding Issues and PRs to AI does not yield good results for the following reasons:

1. High Noise: The original discussions are filled with irrelevant social expressions such as "thanks" and "merge requests". 2. Unstructured: Logs, error information, and repair logic from different projects are mixed together without a unified format. 3. Difficult to Retrieve: Simple semantic matching is easily misled by surface keywords and cannot reach the deep - seated repair logic.

The emergence of MemGovern is to transform this "raw data" into "experience cards" that AI can truly use.

Experience Refinement Mechanism

MemGovern does not directly feed raw GitHub Issues and PRs to agents. Instead, it builds a hierarchical filtering and content - purification pipeline.

Hierarchical Selection: First, high - quality repository sources are selected by comprehensively considering the number of stars and maintenance activity (Issue/PR frequency). Then, strict cleaning is carried out at the instance level, and only "closed - loop" repair records containing a complete evidence chain (problem - code - verification) are retained.

Standardized Experience Card: This is an original design of MemGovern. The original records are reconstructed into standardized experience cards, and each card is explicitly decoupled into two layers:

Index Layer: It contains a standardized problem summary and key diagnostic signals (such as exception type and error signature) for efficient retrieval based on symptoms.

Resolution Layer: It encapsulates root - cause analysis, repair strategy, patch digest, and verification method.

This structured design effectively solves the problem of confusion between retrieval signals and reasoning logic and significantly improves the usability of knowledge. Currently, the team has successfully built a knowledge base containing 135,000 high - fidelity experience cards.

Agentic Experience Search: "Search - then - Browse" Documents Like Humans

Traditional RAG (retrieval - augmented generation) often stuffs retrieval results into the model all at once, which easily leads to an overly long context filled with noise. MemGovern adopts the Search - then - Browse mode that is more in line with human intuition:

Searching

The agent first conducts a broad - based search in the index layer based on the symptoms of the current bug (such as the error stack) to quickly locate potentially relevant candidate cases.

Browsing

The agent independently selects the most promising case and views its detailed "solution layer". This mechanism allows the agent to deeply understand the repair logic and eliminate irrelevant interference.

Migration and Application

The agent maps the abstract repair strategies (such as "add boundary checks") from historical cases to the current codebase to achieve knowledge transfer.

Experimental Evaluation: Significantly Outperforming Mainstream Baselines

The research team conducted a detailed evaluation on SWE - bench Verified. The results show that MemGovern has achieved significant improvements across all tested models.

Main Experimental Results (Pass@1 Repair Rate):

Claude - 4 - Sonnet+MemGovern

The repair rate reaches 69.8%, a 3.2% increase compared to the baseline SWE - Agent.

GPT - 4o+MemGovern

The repair rate soars from 23.2% to 32.6%, achieving a significant 9.4% increase.

DeepSeek - V3+MemGovern

The repair rate increases to 65.8%.

The experimental data clearly shows that the improvement of MemGovern is robust and model - independent. For models with weaker basic capabilities, the external experience provided by MemGovern can bring more significant performance leaps.

Ablation Experiment Verification:

Impact of Memory Scale

As the number of experience cards increases from 10% to 100%, the agent's repair rate shows a monotonically increasing trend, proving the effectiveness of large - scale experience memory.

Importance of Refinement

Compared with directly using raw Issue/PR data (Raw Experience), the "refined" experience cards bring more stable and higher performance improvements, proving the necessity of structured governance.

Case Analysis: How Does Experience Change the Outcome?

In a real - world bug in the Django framework (crash caused by order by), we can clearly see the value of MemGovern.

Traditional Agent (No Experience):

The inexperienced agent can only see the surface of the error.

It adopts a "defensive programming" strategy and simply adds a type check to bypass the error. However, this actually violates the API specification of the function - it returns the wrong original object instead of the expected processing result.

This "ostrich - like" repair temporarily eliminates the runtime error but causes the downstream core functions to fail due to data - type mismatch and ultimately fails to pass the test cases.

MemGovern Agent:

The agent retrieves a similar historical experience.

The "Fix Strategy" in the experience card clearly states: "Don't just bypass the object; instead, perform an explicit type check and extract the field name."

Based on this guidance, the agent writes perfect repair code that not only fixes the crash but also retains the original functions.

Experience Reconstruction

The proposal of MemGovern is not only a breakthrough in performance indicators. More importantly, it points out a clear and feasible path for AI agents to effectively utilize massive amounts of unstructured human debugging experiences.

It proves that after processing the cluttered raw Issues and PRs on GitHub, they can be regarded as "experience memories" that are retrievable, verifiable, and transferable, rather than "interference data" filled with noise. This is a powerful paradigm for breaking the limitations of the closed world of agents and solving complex real - world bugs.

In the future, the experience - reconstruction paradigm pioneered by MemGovern has potential far beyond the code field.

This method of transforming unstructured human professional experiences into machine - readable memories is highly versatile and valuable for generalization. It provides a standardized template for vertical fields such as legal consultation and medical diagnosis, which also highly rely on historical cases and expert experiences.

We look forward to the concept of MemGovern going beyond code repositories and completing more complex intellectual tasks that require "learning from history", laying the foundation for building cross - domain and general - purpose agent memory infrastructure.

Paper Title: MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Paper Link: https://arxiv.org/abs/2601.06789

Open - Source Code: https://github.com/QuantaAlpha/MemGovern

About QuantaAlpha

QuantaAlpha was founded in April 2025 and is composed of professors, post - docs, doctors, and masters from prestigious universities such as Tsinghua University, Peking University, Chinese Academy of Sciences, Carnegie Mellon University (CMU), and Hong Kong University of Science and Technology. Our mission is to explore the "quanta" of intelligence and lead the "alpha" frontier of agent research - from CodeAgent to self - evolving intelligence, to financial and cross - domain specialized agents, and to reshape the boundaries of artificial intelligence.

In 2026, we will continue to produce high - quality research results in areas such as CodeAgent (end - to - end autonomous execution of real - world tasks), DeepResearch, AgenticReasoning/Agentic RL, self - evolution, and collaborative learning. We welcome students interested in our research directions to join us!

Team Homepage: https://quantaalpha.github.io/

This article is from the WeChat official account "QbitAI". Author: MemGovern Team. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The open-source framework allows AI to learn from GitHub's code, boosting the bug fix rate to 69.8% and setting a performance record.