OpenAI engineers have stopped writing code: AI writes too fast for humans to keep up with the checking, and agents directly handle the development.
OpenAI recently said something quite astonishing in a blog: their own engineers hardly write code anymore.
In an internal project, in just five months, one million lines of code were produced, and none of them were handwritten. All were written by Codex.
This code is not just scattered scripts. It's a complete internal beta version of a software product built from scratch: from application logic, infrastructure, to tools, documentation, and internal developer tools, almost everything is included.
01
Some clues to this change may be found in OpenAI's long - standing engineer culture.
Calvin French - Owen, an OpenAI engineer who participated in the Codex project, wrote a blog after leaving the company. Although he complained that the rapid expansion of OpenAI's staff in the past year brought a lot of chaos.
However, he also mentioned that there is still a strong startup atmosphere within the company: small teams, fast decision - making, and engineers have a high degree of autonomy.
In addition, in many tech giants, the senior management sets the direction and then the teams execute. But at OpenAI, there is usually no clear long - term roadmap. Researchers often discover problems and come up with ideas on their own, and small teams form naturally around good ideas to advance projects.
He said that good ideas that truly drive progress can come from anywhere at any time, rather than from some grand plan.
"OpenAI places great emphasis on the bottom - up approach, especially in research."
For example, Codex actually originated in a small team of only a dozen people at OpenAI. This team worked almost non - stop for seven weeks, pushing Codex from an idea to launch.
The fact that "OpenAI engineers don't write code" actually starts with a new bottleneck discovered by a team at the company in the development process.
AI coding is nothing new these days. But after Codex started generating code on a large scale, OpenAI's R & D team quickly found a new problem:
The code generation is fast enough. The slow part is having humans check the code.
Human time and attention are limited. In the entire development process, the most likely bottleneck has become QA (quality testing).
To solve this problem, OpenAI's engineers had a new idea: simply let Codex imitate engineers and "look at" and "use" the application by itself.
So what are OpenAI's engineers doing now if they don't write code?
—— Design the environment, set up feedback loops, and define architectural constraints, and then let the agent write the code.
The article emphasizes a sentence: "Humans steer, agents execute."
They call this Harness Engineering, which literally means "AI - harnessing engineering".
Engineers become "capability architects"
This project started in late August 2025, with the submission of the first line of content to a completely empty code repository.
The initial architecture, including the code repository structure, CI configuration, formatting rules, package manager settings, and application framework - none of these were handwritten by engineers. Instead, they were automatically generated by Codex CLI calling GPT - 5 under the guidance of a small set of templates.
Even the AGENTS.md that tells the agent "how to work in this repository" was written by Codex itself.
In other words, from the moment this system was born, there was almost no manual code. The entire code repository was built step by step by the agent.
However, things didn't go as smoothly as expected at the beginning: the project advanced slowly at first, but the problem was not with Codex's ability, but with the environment - unclear rules, incomplete tools, and unestablished system constraints.
Some netizens "hit the nail on the head":
"The most poignant line: The agent keeps making the same mistakes. It's not a problem of ability, but that you haven't written down your judgment. If you don't write it down, it will make the same stupid mistake for the hundredth time."
So when development gets stuck, the team no longer thinks about "trying to modify a piece of code". Instead, they first ask a question: What capabilities does the agent lack?
Then turn these capabilities into rules that the agent can understand, execute, and be forced to follow.
That is to say, in the current situation where the agent can test and fix bugs on its own, the focus of engineers' work has shifted from "writing code" to another thing: Making it easier for Codex to do things right and "supplementing capabilities" for the agent.
From this perspective, the work of engineers has actually moved to a higher level: In a nutshell, it is breaking down tasks, designing capabilities, and building systems so that the agent can stably produce correct code.
02
Specifically, there are roughly these things:
The first thing is making the application "readable" to AI.
As mentioned above, humans need to connect the agent to the Chrome DevTools protocol so that it can "touch" the UI.
The second thing engineers need to do is to write all "tacit knowledge" into the code repository and turn it into machine - readable knowledge.
For the agent, content that cannot be accessed at runtime simply doesn't exist. For example, knowledge stored in Google Docs, chat records, or people's minds cannot be accessed by the system.
However, you can't stuff all the rules and instructions into Codex at once. Instead, you need to give it a navigation first and then let it look up the details on its own.
The research team once tried to give the agent a huge AGENTS.md file directly, but soon found it didn't work.
The main reason is that context is a scarce resource. The thicker the instruction manual, the easier it is for truly important information to be buried; and such large documents quickly become outdated and are difficult to verify and maintain.
They summarized this experience as:
"Give Codex a map, not a 1000 - page instruction manual."
This schematic diagram was generated by AI
The third thing engineers need to do is to design an "AI - friendly" architecture.
AI is most efficient in a system with clear structure and well - defined boundaries. For humans, these rules may seem rigid, but for the agent, it's an efficiency multiplier.
So this team at OpenAI designed a strict architecture. Each business domain must follow a fixed hierarchy: Types → Config → Repo → Service → Runtime → UI.
The direction of dependencies is mandatory. Any violation will be automatically blocked.
The fourth thing is to turn "taste" into rules.
"In the AI era, the most important human ability is Taste." With the increasing power of large models, such voices are everywhere.
In this blog, there is a very interesting concept: taste invariants.
It means that engineers' aesthetics, such as file size limits, naming rules, log structures, API specifications, etc., are all written as lint rules.
In this way, AI will automatically follow these rules every time it writes code: "Once human taste is captured, it can be applied to every line of code."
In actual development, humans mainly interact with the system through prompts: describe the task, start the agent, and then let Codex automatically generate a Pull Request.
The subsequent entire process, including code self - checking, agent review, modification based on feedback, and resubmission, is basically completed by the agent itself and keeps looping until all reviews are passed.
The fifth thing is to clean up the "garbage" produced by AI.
The article points out that fully autonomous agents have also introduced new problems.
After almost all the code is generated by Codex, a new problem has emerged: AI will constantly copy existing patterns in the code library, including those not - so - good writing styles. Over time, the code style will gradually "drift".
At first, the team planned to manually clean up this "AI residue" one day a week, but soon found that this method was not scalable.
Later, they wrote the engineers' experience and preferences into a set of "golden rules", such as giving priority to using shared tool libraries and strictly verifying data structures instead of "guessing".
Then they encoded this set of rules directly into the code repository, allowing Codex to automatically scan for problems and initiate a refactoring PR.
This is like adding a "garbage collection mechanism" to the code library: small problems can be cleaned up at any time, and technical debt won't pile up.
This blog has attracted wide attention and discussion in the tech circle. Some people believe that this Harness Engineering is essentially a modern version of cybernetics: engineers no longer write code directly but design systems, rules, and feedback loops, allowing agents to complete the work automatically.
He said that this model has actually appeared three times in history.
From the governor of the Watt steam engine, to the controllers in Kubernetes, to today's AI agents; the real change is not "machines replacing humans", but humans' role changing from executors to system designers and calibrators:
"You no longer turn the valve yourself, but start to steer.
Whenever this model appears, it's usually because someone has built powerful enough sensors and actuators to truly close the feedback loop at that level."
Agents are starting to take over the development process
03
Why can OpenAI's engineers stop writing code? Let's take a look at what their agents can do now.
As mentioned before, OpenAI's engineers had a new idea: simply let Codex imitate engineers and "look at" and "use" the application by itself.
First, make the agent able to "see" the application interface (UI).
They connected the Chrome DevTools protocol to the agent's runtime environment. In this way, Codex can operate the page, read logs, capture the DOM, take screenshots to observe the interface, just like a developer debugging in the browser...
This step is actually very crucial because LLM itself can't see the UI.
After connecting to DevTools, Codex is like having "eyes" and "hands":
It can observe the page through screenshots and the DOM, monitor the running status through the console and network, and can also click, input, and navigate on its own.
This schematic diagram was generated by AI
With these capabilities, the agent can reproduce bugs on its own, automatically run UI tests, and verify whether the fixes are effective.
In this way, Codex is not just writing code. It also starts to work like an automated QA engineer: testing the code it writes by itself and repeatedly fixing it until the system passes the test.
In other words, a large amount of testing and debugging work that originally needed to be done manually has been automated.
As shown in the following picture: the most core step is "Loop Until Clean" - continuously testing, fixing, and testing again until there are no errors in the system.
Second, being able to operate the UI is not enough. The agent also needs to see what's going on inside the system.
For this purpose, OpenAI connected a complete set of observability systems to Codex.
When the application is running, it generates three types of key data, which are also the most commonly used signals for engineers to troubleshoot problems:
- Logs
- Metrics
- Traces
This data is first collected by a component called Vector and then sent to the local observability system.
In this way, Codex can check the system status like an engineer: which service reported an error? Which interface became slower? Where did the request get stuck?
When it finds a problem, Codex will modify the code on its own, submit a Pull Request, restart the application, rerun the task, and then observe whether the system metrics have improved.
The entire process forms an automatic feedback loop: find a problem → modify the code → run again → observe again.
It keeps repeating until the problem disappears.
In other words, Codex