Der erste Bug-Such-Agent für GPT-5 von OpenAI: Vollautomatisches Lesen von Code, Finden von Sicherheitslücken und Schreiben von Reparaturen
AI Coding has been popular for more than half a year, and now AI Debugging is here!
Just now, OpenAI released a "white-hat" Agent driven by GPT-5 - Aardvark.
This "AI security researcher" can help developers and security teams automatically discover and fix security vulnerabilities in large-scale codebases.
According to an OpenAI report, Aardvark has identified 92% of known and artificially injected vulnerabilities, and can locate problems that only occur under complex conditions.
Matt Knight, the vice president of OpenAI, said:
Our developers found that Aardvark is really valuable in clearly explaining problems and guiding them to find solutions. This signal tells us that we are on a meaningful path.
Moreover, it's not just OpenAI.
Throughout October, Anthropic, Google, and Microsoft basically released similar white-hat Agents one after another.
What's going on here?
Agentic AI + Automatic Vulnerability Patching
OpenAI's official description of this white-hat Aardvark is - agentic security researcher.
Aardvark's core task is to continuously analyze source code repositories to identify security vulnerabilities, evaluate exploitability, determine risk levels, and propose targeted repair solutions.
It works by monitoring code commits and changes, automatically identifying potential vulnerabilities, inferring attack paths, and generating repair suggestions.
Aardvark does not rely on traditional program analysis techniques (such as fuzzing or Software Composition Analysis - SCA), but uses reasoning and tool-using capabilities driven by large language models to understand code behavior, read and analyze code, write tests, and run verifications like a human security researcher.
Specifically, its workflow starts from the Git repository and goes through the following steps in sequence: Threat Modeling → Vulnerability Discovery → Sandbox Verification → Codex Repair → Manual Review → Submit Pull Request.
Analysis: Conduct a comprehensive analysis of the entire repository to generate a threat model that reflects the project's security objectives and design.
Commit Scanning: When new code is committed, scan the differences in combination with the repository and the threat model; when connecting to the repository for the first time, trace back historical commits. At the same time, explain the discovered vulnerabilities and mark them in the code for easy manual review.
Verification: Once a potential vulnerability is identified, trigger the potential vulnerability in an isolated environment to confirm its exploitability, and explain the verification steps to ensure accurate results and a low false alarm rate.
Repair: Aardvark is deeply integrated with OpenAI Codex to generate repair patches for vulnerabilities and attach them to the report for easy one-click review and application.
Currently, Aardvark can be seamlessly integrated with GitHub, Codex, and existing development processes, providing actionable security insights without affecting development efficiency.
Internal tests show that it can not only identify security vulnerabilities but also discover logical defects, incomplete repairs, and privacy risks.
Moreover, Aardvark has been tested and run in internal and partner projects, performing excellently and verifying its practical usability.
As mentioned at the beginning, it can not only conduct in-depth analysis and locate problems that only occur under complex conditions, but also achieved a 92% identification rate in benchmark tests on "golden repositories".
In addition, Aardvark has also been applied to multiple open-source projects, discovering and being responsible for disclosing many vulnerabilities, 10 of which have obtained CVE numbers.
OpenAI said it will provide free scanning services for some non-commercial open-source repositories and improve the security of the entire open-source ecosystem and supply chain.
Aardvark has now started its internal testing. Developers in need can directly apply on the official website.
AI for Programming, AI for Repairing
As mentioned at the beginning, not only OpenAI, but other tech giants are also actively deploying Agentic AI + Code Security.
Throughout October, Google, Anthropic, and Microsoft seemed to have made an early appointment and all took relevant actions. OpenAI was a bit late this time.
For example, on October 4th, Anthropic said it would apply Claude Sonnet 4.5 to code security tasks.
It is reported that Claude Sonnet 4.5 has outperformed Opus 4.1 in terms of discovering code vulnerabilities and other cybersecurity skills, and it is cheaper and faster.
On October 6th, Google released CodeMender, which uses the Gemini Deep Think model to achieve autonomous debugging and vulnerability repair.
On October 16th, Microsoft released Vuln.AI, officially announcing the use of AI for vulnerability management. And on the last day of October, OpenAI finally caught up with this update rhythm.
(Note: All companies conducted months of testing and verification before the release.)
So, why do these giants choose to focus on AI code security at this time?
The explanations from OpenAI and other companies are highly consistent: Manual debugging and traditional automation methods (such as fuzzing) can no longer keep up with the needs of discovering and repairing vulnerabilities in large-scale codebases.
On the one hand, there are a large number of devices, services, and codebases in enterprise-level networks. On the other hand, although AI technology can improve productivity, it is also used to quickly find vulnerabilities and generate attack code.
Therefore, in the context of a sharp increase in the number of vulnerabilities and increasingly intelligent attack methods, using AI to automatically discover and repair vulnerabilities has become a key means to ensure software security and reduce enterprise risks.
However, while the big companies are talking, some netizens have found an interesting point:
We have an Agent that can create security vulnerabilities and an Agent that can repair them. This is the best business model.
Reference Links
[1]https://x.com/OpenAI/status/1983956431360659467
[2]https://openai.com/index/introducing-aardvark/
[3]https://www.anthropic.com/research/building-ai-cyber-defenders?utm_source=chatgpt.com
[4]https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/
[5]https://www.microsoft.com/insidetrack/blog/vuln-ai-our-ai-powered-leap-into-vulnerability-management-at-microsoft/?utm_source=chatgpt.com
This article is from the WeChat official account Quantum Bit. Author: henry. Republished by 36Kr with permission.