My lord, AI programming has undergone another major shift. The father of Claude Code and the founder of Lobster are both enthusiastically endorsing a new paradigm—could this kill prompt engineering?
"A year ago, the way I wrote code was in an IDE, with some kind of auto - completion feature. In November last year, I uninstalled the IDE because I no longer needed it. At that time, I might have had 5 to 10 Claudes running simultaneously. What I call writing code is actually prompting Claude to write code."
Boris Cherny, an engineer at Anthropic and the creator of Claude Code, said in a recent sharing, "Now, I think we've reached the next level: I no longer prompt Claude directly. I have a bunch of loops running, and they are the ones prompting Claude and deciding what to do next. My job has become writing loops. I believe this is the next transformation we'll see in the next few months, or even for the rest of this year."
Just today, Peter Steinberger, the so - called "Father of Lobster" who now works at OpenAI, also tweeted, "You shouldn't write prompts for programming agents anymore. You should design a loop mechanism and let these loops prompt your agents." As of the time of writing, the post has received 1.5 million views and sparked a lot of discussions among developers.
The public comments from Boris Cherny and Peter Steinberger are bringing a new paradigm to the forefront: Loop Engineering. That is, developers no longer just manually input prompts to programming agents, but design loop systems that can continuously prompt, schedule, and constrain agents.
Some netizens commented that LinkedIn might soon see a new trend of "Loop Engineering." Peter later responded, "Don't worry. It'll probably take about 3 months to reach that point. After that, people will be discussing 'designing your fleet of loops'."
This reflects that the community has started to regard "writing loops" as the next level of abstraction after writing prompts. Some people summarize this change as "from prompt engineer to meta - prompt engineer."
In addition, some developers said they have verified that this approach works. "I'm building loop processes. Now the configuration is finally adjusted, but then character bloat appears, burning up my quota very quickly. It's quite annoying because it can actually run, but if I can figure out what's going wrong, it could clearly complete the same task at twice the speed and at a much lower cost."
Loops Are Not Mechanical Repetitions
"Isn't a loop basically just a cron job? Do they just keep telling the model to 'make this application better'?" Some developers questioned the meaning of loop engineering.
In response, some developers said that to make it truly useful, you need a feedback loop.
Think about what we need as a development team: We need to know whether a new feature works as expected, where it can be improved, what other problems users have, which workflows can be optimized, and what value optimizing these workflows can bring, etc.
For some things, LLMs can directly access data; but you can also generate data, such as conducting user interviews, creating tasks, etc., or let it generate data itself, such as doing A/B tests, adding monitoring, etc.
Just like a development team, you also need some OKRs or goals. If you're developing an application for internal employees, the goals can be related to improving performance, reducing error rates, automating/simplifying workflows, etc. If it's an e - commerce application, the goal might be to optimize the process from the conversion funnel to the deal and increase revenue. It can also be used to upgrade libraries, like Dependabot, fix security vulnerabilities, manage technical debt, analyze usage, do QA, etc. You need a clear goal and a feedback loop to verify the output results.
YC CEO Garry Tan also reminded when reposting relevant discussions that we shouldn't turn agents into "Foxconn factory" - style repetitive labor machines. He believes that agents are usually intelligent, capable of thinking, and not dangerous. Developers should let them take on more work instead of just repeating the same action.
Some developers pointed out that agents can do more, but the boundaries must be clear. The goal shouldn't be to watch them at every step, but to provide agents with clear context, reliable tools, auditable operation records, and safe stop conditions. In this way, they can run autonomously without becoming an out - of - control "shadow automation."
A developer also pointed out under Peter's post that designing a loop only completes half of the work. The other half is to put a mechanism that can say "no" in the loop, such as testing, type checking, or real - world errors. Otherwise, a loop without a feedback mechanism will only make the agent keep repeating and self - confirming. Peter later responded that he uses the VISION.md file in his project.
These show that truly effective Loop Engineering is not a simple automation script, but an engineering system with a feedback closed - loop. A loop needs to know when to continue, when to stop, when to roll back, and when to hand over to humans. Otherwise, the agent's errors may be continuously magnified in the loop.
Some developers also said that this highly depends on the specific scenario: If you use loops to build a web application, it may lead to system bloat, and then you need to use loops again to reduce the complexity. Therefore, a strict governance stack and clear specifications must be established.
Some people also asked whether the so - called loop is a loop in the CLI, in a shell script, or just let Claude loop by itself?
Actually, Claude Code previously released the Loop function. Developers can directly set periodic prompts in the CLI to let Claude Code repeatedly execute tasks at fixed intervals.
Boris Cherny also introduced his current workflow recently: Let a large number of AI agents work in parallel for a long time. Usually, "thousands" of AI agents run at night, allowing them to continuously perform deeper - level development tasks, and manage these tasks through the Claude App. The key to this workflow lies in two features for continuous automation in Claude Code: /loops and Routines.
Cherny introduced that users can use cron to run /loops locally at regular intervals, allowing agents to execute tasks cyclically as planned; while Routines run on the server - side and can execute periodic tasks. In this way, even if the engineer closes the laptop, the relevant agents can still continue to work.
The key change in Loops is that it no longer depends on an external cron or shell loop. In the past, when using a script to wrap claude - p, each call was a "cold start" and lacked the context of the previous round; while Loops run in a continuously existing Claude Code session, retaining the context window, tool permissions, and MCP connection, allowing the agent to remember the previous operation and continue to advance in the next round.
Developers can create tasks in natural language, for example: "Check every 5 minutes whether the PR build passes. If it fails, read the error log, fix the problem, and push a new commit." They can also create tasks through commands:
/loop "Summarize any new posts tagged #announcements in the team Slack channel" --interval 30m --expires 8h
When netizens asked Peter how he did it, Peter only said that he was using claw to monitor his Codex and didn't explain much more.
Currently, although Codex has automation/scheduling capabilities, there are no clear native loop commands in the CLI like Claude Code, such as cron create to create a new scheduled loop, cron list to view all active loops in the current session, and cron delete to immediately terminate a specified loop by ID.
Interestingly, when a user asked Peter how to achieve this in VS Code, Peter asked back, "Who still uses VS Code these days?"
"We've gone from 'learning to write code' to 'learning to write the thing that writes code'. Somehow, this sounds both like progress and like a pyramid scheme," a netizen commented.
Developers: You Have Unlimited Token Supply, But I Don't
The idea is wonderful, but the reality is that the token consumption of this loop engineering is quite high. Whether it's Boris Cherny or Peter Steinberger, the companies behind them provide almost unlimited token support. However, for many people in the community, their token budgets are not that high.
Previously, Developers Digest posted an article reminding that each loop iteration is a complete prompt execution. If it's set to execute once a minute and runs continuously for 8 hours, it will result in 480 API calls. Therefore, the team needs to plan the usage cost in advance.
Regarding the token consumption problem, even Peter has no solution. After someone pointed out that "a $20 package is simply impossible," he just said, "Yes. But isn't your time valuable?"
Some developers also said, "A loop can be a for loop or a while loop. Companies with abundant tokens can use while loops at will; startups with limited tokens can also use for loops to achieve the same goal, but it will take longer."
For this reason, a netizen half - jokingly said to Peter, "How hypocritical. Are you saying these things to people with unlimited tokens? Why make it seem like a technical problem rather than a financial problem?" Peter's answer was also quite a "correct but useless statement": A good idea that can be sold still requires human ingenuity.
In terms of specific implementation, the current approach of Claude Code to deal with the token consumption problem is basically to set various restrictions:
Loops support a minimum interval of 1 minute and a maximum running time of 3 days. After the time expires, they will stop automatically to avoid unmanaged background processes and out - of - control API bills; Loops are not a persistent background task system. They are bound to the current Claude Code session. After closing the terminal or ending the session, the loop will stop. This design is for security and predictability, to prevent tasks from continuing to consume API quotas after the user forgets. In addition, Claude Code also provides a switch to disable loops. If users are worried about the out - of - control of automated tasks, the soaring API cost, or don't want team members to use the loop - style agent workflow, they can turn off this function.
In addition to the cost, implementing loops may be more troublesome than we think.
"Everyone is rushing towards loops, but debugging a state machine that has run 47 rounds is 10 times more difficult than fixing a prompt," a netizen pointed out. "The direction of loops is correct, but we've skipped a key stage: Most people still can't write a reliable one - time prompt well."
Some developers who have used loops said, "It's easy to set up at the beginning, but then you'll realize there are a lot of pain points, and it's too difficult to fix them."
"Now that I think about it, I feel a bit sorry for my colleagues because I introduced and promoted loops in our organization. If we migrate to another solution now, it will consume a lot of time and resources, so we can only hold on for a while until it becomes really painful," a developer posted in the comments.
"We've also done this. We integrated it and tried to use loops in a project. Now, just exporting the data from that project and migrating it to the tool we're using now will take a lot of time and effort, and no one wants to do it. My advice is: migrate as early as possible. The longer you wait, the worse the situation will get," a developer replied.
How Did Claude Code Go from 20 Minutes to 'Several Days' in One Year?
The focus of Loop Engineering is "to keep the agent on track during long - term operation and be able to reliably judge whether it's doing things right." In this regard, Claude Code itself is a typical example.
Ash, an engineer on the Applied AI team at Anthropic, recently said that the company's current exploration direction is more towards "maximizing autonomy." The goal is to write human judgment into the Harness rather than inserting manual intervention in unstable links. The team will run multiple generated results simultaneously, read failure cases, adjust prompts, and then iterate repeatedly until they can let the agent run autonomously with relative confidence.
In the past year, Claude Code has evolved from "only being able to run continuously for about 20 minutes and prone to errors in Bash commands and string escaping" to the stage of "almost being written by Claude Code itself and being able to run continuously for several days."
Andrew, an engineer at Anthropic, pointed out that the core difficulties in letting an agent run continuously for hours or even days mainly fall into three categories: context, planning, and self - judgment.
First, the context window is limited. A new session will make the agent start from scratch like having "amnesia"; in a long - running session, context rot may occur. The closer the model gets to the end of the context window, the more likely it is to have "context anxiety" and rush to end the task. Second, the model is not good at long - term planning by default. It may try to complete all tasks at once or stop after only completing half of the functions. More importantly, it's difficult for the model to accurately judge its own output. It often mistakes a semi - finished product for completion. For example, the front - end button has appeared, but the back - end logic doesn't exist.
To solve these problems, Anthropic has adopted two approaches: one is to continue to improve the model itself and write more long - term task capabilities into the model weights; the other is to transform the Harness outside the model, that is, the intelligent agent scaffolding around the model.
In terms of the specific mechanism, the early long - running agents at Anthropic will first have an initialization agent break down the vague requirements into a set of persistent files, such as feature_list.json, progress files, a Git repository, and an initialization script. Then, the agent repeatedly executes in a fresh context window: reads the current progress, starts the project, selects an unfinished function, implements it, tests it, commits the code, and continues to the next round. This model alleviates context loss and task drift in long - term tasks through "a new context window + persistent artifacts + a verification loop."
However, with the enhanced capabilities of new models such as Opus 4.6, Anthropic has started to simplify this type of Harness. Opus 4.6 is better at planning and tool selection, and Sonnet 4.6 provides execution capabilities close to Opus at a lower cost. Therefore, a common combination is to use Opus for planning and Sonnet for code execution. At the same time, server - side compression and million - level context windows enable the model to maintain better coherence in a single long - running session, no longer always relying on frequently starting new sessions.
Currently, the cutting - edge Harness model that Anthropic is experimenting with internally is the generator - evaluator - planner structure. This design draws on the idea of a generative adversarial network (GAN): the generator is responsible for building the application, and the evaluator is responsible for criticizing and scoring. The two continuously improve the quality of the results through adversarial pressure.
Unlike having a single Claude Code session self - check, Anthropic will give the evaluator an independent context window and system prompts, and actually use Playwright to