Loop Engineering: The New Circular Toll Station
In June, the AI programming circle was ignited by two statements.
Boris Cherny from Anthropic said: "I'm no longer writing prompts for Claude. My job is to write loops." Peter Steinberger from OpenAI said: "Stop writing prompts for programming agents. Go design loops."
These two statements received tens of millions of exposures. Addy Osmani, a Google engineering director, immediately officially named it: Loop Engineering. Headlines like "Prompt engineering is dead" filled the screens.
Let's first clarify what Loop Engineering is all about!
Stripping away the new term, a loop is a pre - written "foreman" logic: it figures out what tasks need to be done, assigns a task to the AI, checks the returned result. If it's not up to standard, it sends the error message back and assigns the task again until it passes or reaches the preset number of attempts and budget limit.
The key difference lies in whether it's the user or the AI that performs the step - by - step operations in the middle.
In the past, the usage was like this: tell Claude "Write a set of CRUD interfaces for to - do items". After it finishes writing, if the user notices that field validation is missing, they say "Add validation and tests". Then it makes the changes. This back - and - forth process requires the user to keep an eye on each step and give instructions. This is prompt engineering: having a round - by - round conversation with the model.
A loop flips this around. The user writes a short script and sets four things at once: the goal (the interface should work and all tests should pass), the acceptance criteria (run 'npm test'), the available tools, and when to stop (when the tests pass or after a maximum of 50 attempts). Then they let go. The script repeatedly prompts the model, runs the tests on its own. If it fails, it feeds the error message back for the model to make changes. The whole process doesn't require human intervention, and it only calls for human attention after it finishes or gets completely stuck.
In Boris Cherny's words, the smallest unit of work has changed: from writing a line of code, to writing a prompt, and then to writing a loop. The user is no longer the one writing prompts, but the one writing the "thing that writes prompts".
Essentially, a loop is a state machine with fuzzy judgment. The difficulty has never been the loop itself, but the boundary conditions that prevent it from burning through $200 in an infinite loop. Remember this, as it will be on the test later.
However, the core concept is actually an old one. In 2023, AutoGPT tried to let the AI run loops on its own. Without verification and boundaries, it ran wild and ultimately failed. In 2025, Context Engineering was endorsed by Karpathy, and in early 2026, Harness Engineering was still in the spotlight. A concept with an old core but with added control mechanisms, why did it suddenly need a new name in June 2026?
Technology is indeed evolving, but whether it's necessary to adopt new methods can be found in the business context.
The models have reached a plateau
There is a widespread feeling across the industry that the marginal surprises in the capabilities of large models are rapidly diminishing.
From GPT - 4 to Claude 4 and then to Gemini 2, the perceived differences on the developer side are continuously narrowing. A year ago, switching models might have led to a significant jump in output quality. Now, the differences are more like one model having a smoother syntax and another having more standardized comments. Although the numbers in benchmark tests are still increasing, the "aha moments" in the production environment are becoming fewer.
A study by MIT in early 2026 pointed out that as the investment in computing power increases, the performance gap between top - tier models and lightweight models is converging, and the improvement brought by each additional dollar of investment is continuously decreasing. Steve Eisman said in a podcast at the end of 2025 that continuing to scale up large language models (LLMs) might be a dead end. Ilya Sutskever also stated at NeurIPS 2024 that the era of pre - training is coming to an end.
However, the narrowing of marginal surprises in chat scenarios doesn't mean that the models have stopped evolving in all scenarios. The available window for the Agent stack has just opened at this point. Tool calls have evolved from being fragile to having a standardized MCP protocol. Long - context capabilities have improved from being forgetful to stably outputting millions of tokens. Self - verification has changed from being self - serving to having an engineering mechanism that separates writing and checking. Although the models themselves haven't had an exponential leap, the engineering infrastructure around them has been completed.
Thus, a delicate sweet spot has emerged: The models are good enough to keep the loops from crashing, but not good enough to make the loops redundant. If the models could achieve the goal in one step, there would be no need for a paid loop on the outside. This is exactly when Loop Engineering is being promoted.
For companies like Anthropic and OpenAI, whose valuations are based on continuous growth, the inability to differentiate their models is the most dangerous signal. The models are the infrastructure, but the profits don't come from the models themselves, but from the "toll booths". They must create a premium on the pipelines through which the models flow. Loop Engineering is this new pipeline.
Vendors are starting to sell "paradigms"
From 2022 to 2024, vendors sold model capabilities. The one with the smarter model won.
Starting in 2025, the rules changed. As the differences between models narrowed, vendors began to sell "ways of using the models". Context Engineering says that the models are already smart enough, and the bottleneck lies in the usage method. You need to set up the context correctly. Harness Engineering says that the models are already smart enough, and the bottleneck lies in the usage method. You need to build a good scaffold for the agents. Loop Engineering says that the models are already smart enough, and the bottleneck lies in the usage method. You need to upgrade yourself to a loop designer.
Each round conveys the same underlying message: The models are already smart enough, and the bottleneck lies in the usage method.
This statement may not be false. If the bottleneck has truly shifted from the models to the usage, then it's a fact. The problem lies in how the vendors use it. They quietly translate the pressure of the slow - down in model growth into users' anxiety about their own capabilities. What users buy has changed from computing power to the qualification - the qualification not to be left behind.
Looking at the recent timeline of AI development, it seems to be a form of "agenda - setting". In mid - 2025, after Context Engineering was promoted by Tobi Lütke and others and endorsed by Karpathy on the social level, it quickly became a prominent theory in the agent stack. In early 2026, Mitchell Hashimoto proposed Harness Engineering. In June 2026, Addy Osmani named Loop Engineering, which went viral across the internet.
It took about nine months from Context to Loop. Each round is endorsed by top - tier figures in the industry, and each round claims that the previous one is outdated.
The natural rhythm of technological iteration is always slow. It took twenty years for TCP/IP to be popularized after its proposal, and five years for React to dominate the front - end after its release. A real engineering paradigm shift is slow, bottom - up, and full of controversy. However, the line from Prompt to Context to Harness to Loop is fast, top - down, and in unison.
We need to be precise here. The same set of phenomena - multiple vendors acting in sync and concepts progressing in an orderly manner - can be interpreted either as a carefully orchestrated plan or as another possibility: several laboratories, using the same set of tools, hit the same engineering wall and naturally converged to the same solution. Convergence doesn't equal collusion. So a more cautious and tenable statement is: the vendors may not have orchestrated this rhythm, but they are definitely making good use of it. In either case, this rhythm looks more like a brand refresh cycle than a natural shift in traditional engineering paradigms.
What's more notable is the coincidence between the release of concepts and products. On May 28th, Anthropic launched Dynamic Workflows for Claude Code, allowing the model to write orchestration scripts on its own and schedule hundreds of sub - agents in the background. OpenAI's Codex added the ability of continuous goals earlier in the spring. The products are prepared first, and then a concept is awaited to detonate the market. The naming of Loop Engineering is essentially a re - auction of attention. Interestingly, the winners are always those with the most tokens.
When users are arguing on X whether Loop Engineering is just old wine in new bottles, they have already accomplished what the vendors wanted: shifting the attention from "whether the models have improved" to "whether the new paradigm is worth pursuing".
Lock - in and cost
Loop Engineering seemingly improves efficiency, but in fact, it burns money on two fronts: the migration cost and the running bill.
Let's start with lock - in. When you write prompts in SKILL.md, acceptance rules in CLAUDE.md, and embed loop logic in the loop and dynamic workflows of Claude Code, you're not just using a tool; you're building a proprietary architecture. The more complex the loop and the more rules are accumulated, the deeper your dependence on this system will be.
The loop components of Anthropic and OpenAI are almost identical: Automations, Worktrees, Skills, Connectors, Sub - agents, and Memory. The high consistency of these six components is essentially a two - way lock - in. Since they can't differentiate at the model level, they create a choice cost at the engineering level. If you choose the loop system of Claude Code and then want to switch to Codex, you'll have to rebuild everything; the same goes for the reverse.
Informal feedback from some early - stage teams shows that migrating out of the Loop system after introducing it into the organization takes far more time and resources than expected, and the longer it takes, the worse it gets. The vendors' plan is not to sell the API once, but to make users pay the engineering cost every year to maintain the existing system.
More hidden than technical debt are conceptual debt and understanding debt.
Changing the concept every nine months means that the team has to restructure the workflow every nine months. The context system just set up by Context Engineering has to be changed when Harness comes along. After Harness has stabilized the scaffold, it has to be changed again when Loop arrives. The vendors won't pay for this reconstruction, but the team's productivity is continuously consumed during the concept - switching period.
Along with this is the understanding debt at the code level. No one reads the code produced in batches by loops, and the team's understanding of the system is continuously decreasing. On one hand, they're chasing new concepts to change the workflow; on the other hand, they're at a loss when facing the black - box code. Addy Osmani himself also issued this warning: the faster the loops produce code, the lower the proportion of code that users can understand. The most comfortable choice is to give up on understanding: Accept any result returned by the loop.
This is not unfounded worry. The influence of Vibe Coding in 2025 still lingers. A randomized controlled experiment by METR in July 2025 found that experienced developers using AI tools to handle complex tasks actually had a 19% decrease in efficiency. (It should be noted that METR raised reservations about the research method in early 2026, and the conclusion was revised to "It's not yet certain whether AI can improve productivity". This data should be used with this reservation.) On the security side, a report by Veracode in 2025 showed that 45% of AI - generated code failed the security test. There was also an incident on the Lovable platform where user data was exposed in batches.
Loops magnify these problems. There are three particularly hidden pitfalls: slacking off, claiming to have completed 50 security tasks after only doing 20; self - praise, giving high scores to one's own results; and drifting, where the initial constraint of "don't do X" quietly disappears after 47 rounds. The tests may pass, but the architecture may deviate. The functions may work, but there may be logical landmines. Without anyone monitoring the intermediate products, no one knows where the error occurred. Debugging a state machine that has run 47 rounds is 10 times more difficult than fixing a prompt.
Ironically, the main victims of conceptual debt are mid - level developers. Top - level designers like Boris have almost unlimited tokens and complete infrastructure. Conceptual iteration is just an additional management dimension for him. Those at the bottom who write prompts haven't even entered the game yet. Those in the middle, who have just learned the previous round, are faced with the next round. They're always chasing but never catching up.
Then there's the bill, which is the most direct cost of this paradigm.
In May 2026, according to The Verge (Tom Warren), Microsoft required thousands of engineers in its Experiences + Devices department to switch back from Claude Code to GitHub Copilot CLI before the end of the fiscal year on June 30th. The official reason given by Microsoft was to unify the toolchain and have a product that could be shaped together with GitHub. However, this move at the end of the fiscal year is generally interpreted as being driven by cost. Remember, Microsoft itself invested up to $5 billion in Anthropic through the Foundry agreement, and even it couldn't control the bill for heavy usage.
The case of Uber is more straightforward. After rolling out Claude Code to about 5,000 engineers, it burned through its entire 2026 AI budget in four months. The adoption rate soared from 32% in February to 84% in March. The average monthly expenditure per person was between $150 and $250, and heavy users spent between $500 and $2,000. The CTO himself spent $1,200 in a two - hour session. The management described this as a "mind - blowing" moment.
In other articles, these numbers are a "cost trap, use with caution". From a business perspective, they are the direct result of conceptual iteration.
The essence of Loop Engineering is to make users change from "calling the model on demand" to "running the model continuously". The loop runs once a minute, and the dynamic workflows run 24/7 in the cloud. Thousands of agents run in parallel at night. Anthropic itself directly warns in the description of dynamic workflows that this function consumes far more tokens than normal conversations and suggests trying it with small tasks first. On the surface, it's a technological advancement, but in essence, it's an upgrade in the consumption model: From "buying electricity" to "consuming electricity all the time".
This is the Jevons Paradox in economics: an improvement in technical efficiency leads to an increase in total consumption. The vendors' revenue formula is simple: user retention time multiplied by call frequency multiplied by token unit price. Loop Engineering boosts the first two variables simultaneously, making the AI go from "only moving when called" to "always moving on its own". The more it moves, the thicker the bill.
Facing the doubt that "a $20 package is simply impossible", Peter Steinberger replied: "That's right, but isn't time valuable?" In other words, stop calculating the token cost and start calculating the time cost. However, the time cost is vague, emotional, and unauditable; the token cost is clear, rigid, and automatically deducted monthly. The vendors hope that users will use the vague time cost to cover the clear token cost.
Conclusion
AutoGPT in 2023, the first open - source project that went viral and let the AI set its own goals and run loops to do work - failed because it