How to correctly Vibe Coding? This is a master class from the head of Anthropic's programming agent.
If a programmer breaks their hand, has to wear a cast for two months, but can't stop working, what should they do? The answer from Erik Schluntz, a researcher at Anthropic and co - author of "Building High - Performance Agents", is: Leave it all to Claude.
Nowadays, as AI is strongly reshaping the rules of the software industry, Vibe Coding has become an inevitable question for enterprises aiming to multiply their productivity.
A few months ago, Schluntz stepped forward with his wonderful experience of being forced into "fully automated office work" to discuss a somewhat controversial topic with everyone: How to responsibly conduct Vibe Coding in a production environment?
This speech was full of valuable insights and has recently become popular on X. Netizen Movez even praised it as better than 100 paid courses.
In the spirit of Vibe Coding, we also organized Schluntz's speech with the help of AI.
Defining "Vibe Coding"
Many people equate heavy use of AI tools like Cursor or Copilot to generate code with Vibe Coding. This is not the case. As long as developers still maintain a close feedback loop of line - by - line modification and review with the model, it cannot be called true "vibe".
Andre Karpathy gave a more precise definition: "Completely immerse in the vibe, embrace the exponential growth of technology, and completely forget about the existence of code."
This work mode has significantly lowered the development threshold, enabling people without an engineering background to independently develop complete applications. However, in the past, successful cases of this development mode were often limited to personal games or low - risk projects. Once non - professionals bring this mode into a real production environment, it often leads to out - of - control situations such as exhausting API quotas, bypassing subscription verification, or even randomly tampering with the database.
Why Embrace Exponential Growth?
Since there are uncontrollable factors in a high - risk business environment, why should we promote this technology? The core driving force lies in the "exponential growth" of AI capabilities.
Currently, the length of tasks that AI can handle independently approximately doubles every 7 months. Today, AI can stably complete a one - hour coding task, and developers still have the energy to review line by line. By next year or even the year after, when AI can generate code equivalent to one day or even one week of human work at once, if we still adhere to traditional synchronous review and modification, human engineers will surely become the bottleneck of the computing power explosion.
We can refer to the development history of compilers. Early developers may not have trusted compilers and would still check the underlying assembly code. As the system scale expands, developers must learn to trust higher - level abstractions. Looking to the future, the entire software engineering community also needs to think in advance: how to safely and responsibly accept systems directly generated by large models in the production environment.
Finding Verifiable Abstraction Layers and the "Leaf Nodes" Strategy
The core concept of practicing Vibe Coding in a production environment is: Forget about the existence of code, but always focus on the existence of the product.
In modern enterprise management, CTOs rely on acceptance tests to manage technical experts, product managers verify functional designs by experiencing products, and CEOs spot - check financial models through key data slices. They don't delve into the most basic execution details. Software engineers also need to establish similar abstraction layers that can be verified without reading the underlying code.
The core is: Find the abstraction layer you can verify!
However, there is a tricky technical obstacle in current AI coding, which is technical debt. Currently, it is extremely difficult to measure or verify technical debt through other systematic means except by reading the source code.
Based on this, Erik Schluntz proposed to focus on "Leaf nodes" in the codebase.
These nodes refer to end - functions or add - on components that are not dependent on any other modules. In these areas, it is acceptable to have a certain amount of technical debt because they rarely change and will not hinder the construction of subsequent modules. On the contrary, for the main trunk and underlying architecture of the system, engineers still need to deeply understand and strictly protect its scalability.
It is worth noting that as the model's capabilities improve, the code levels that we can trust AI to take over are extending downward. Taking the new - version model recently tested within Anthropic as an example, the success rate of AI in generating high - quality architectures is increasing, and this boundary is changing dynamically.
Be a Full - time Product Manager for Large Models
To make AI output high - quality engineering code, developers need to change their thinking and regard themselves as the product manager of Claude. Don't ask what Claude can do for you, but ask what you can do for Claude.
When facing complex development tasks, developers need to guide AI like teaching a new employee on their first day at work. Simply throwing an instruction like "Implement this function" is doomed to fail. Developers need to provide AI with a detailed codebase navigation and clearly define the requirements and constraints.
Erik Schluntz emphasized his standard pre - workflow.
Before letting Claude actually start writing code, he usually spends 15 to 20 minutes interacting with it. This includes letting AI explore the codebase, find relevant files, and jointly formulate a clear execution plan. Then, he integrates these comprehensively sorted contexts and specifications into a single prompt, and then lets Claude execute. Under this process, the model's task success rate will increase exponentially.
A Case of Merging 22,000 Lines of Code in a Production Environment
In the speech, Erik Schluntz disclosed an extreme real - world case within Anthropic. His team recently successfully merged a code modification of up to 22,000 lines in the production environment of a reinforcement learning codebase, and the vast majority of it was written by Claude.
To responsibly complete this merge, the team adopted four core strategies:
- Deep guidance from a product manager's perspective: Spent several days on pre - manual planning and requirement sorting.
- Strictly define the scope of modification: Strictly limit code changes to leaf nodes where technical debt is allowed.
- Manual intervention in core areas: For the core logic that must ensure underlying scalability, the team conducted strict manual review.
- Establish verifiable checkpoints: Design long - term stress tests for system stability and ensure that the entire system has input and output standards that are easily verifiable by humans.
In this way, a huge project that originally required human engineers to spend two weeks writing and reviewing line by line was compressed to be completed within one day. When the development time cost drops precipitously, engineers will be able to promote large - scale refactoring and feature iteration that were previously put on hold due to resource limitations.
Advanced Skills: Exploration, Testing, and Toolchain Collaboration
In the question - and - answer session that lasted for dozens of minutes, Erik Schluntz gave intensive answers to the practical details that developers were concerned about, covering multiple dimensions from personal growth to tool matching.
Question 1: In the past, we spent a lot of time dealing with syntax, library files, or the connection problems between code components, and we learned in this process. How should we learn now? How can we accumulate enough knowledge to be a good product manager for agents?
Erik: This is a good question. Indeed, we will no longer go through those painful struggles. But I think it's okay, just like today's programmers don't need to write assembly code by hand.
The optimistic side is that I found that with the help of AI tools, my speed of learning new things has greatly increased. Many times I would ask: "Hey Claude, I've never seen this library. Tell me about it. Why did you choose it?" Having such an always - online pair - programming partner means that lazy people will get by, but as long as you are willing to invest time in learning, Claude will help you understand it.
In addition, with the help of AI, we can conduct more "trial - and - error" attempts. An architectural decision that originally took two years to verify can now yield results in six months. As long as you are willing to try, engineers can learn four times as much experience and lessons in the same natural time.
Question 2: In the pre - planning process, how should we balance the amount of information given to it? Is there a standardized template?
Erik: It depends on what you care about. If I don't care how it's implemented, I won't mention a single implementation detail and only give the final requirements. If I'm very familiar with this codebase, I'll go into details about which classes to use and which examples to refer to.
However, models perform best when you don't impose excessive constraints on them. So I don't recommend spending too much energy on creating strict format templates. Just communicate with it as if it were a junior engineer.
Question 3: How to balance effectiveness and network security? For example, there have been reports that many Vibe - programmed applications made by people who don't understand code have serious vulnerabilities.
Erik: It still comes back to the first point: Be a good PM. You need to be knowledgeable and know what is dangerous and what is safe. Most of the vulnerabilities reported in the media were created by people who don't know how to write code at all, so it's okay in game and toy projects. But for production systems, you need to ask the right questions to guide. Our 22,000 - line code case was a completely offline task, so we were sure there was no security risk.
Question 4: Less than 0.5% of the global population understands software. To make it easier for ordinary people to build software and avoid security issues such as API key leakage, what changes do existing products need to make?
Erik: It would be great if more products and frameworks that can achieve "provably correct" emerge. For example, someone could build a system where the back - end locks and secures important authentication and payment parts, and only leaves a "fill - in - the - blank" front - end sandbox for you to do Vibe Coding freely.
The simplest example is like Claude Artifacts. It is hosted in the cloud, has only a front - end, no permissions, and no payments, so it's safe no matter how you mess around. I hope someone can develop such a good supplementary tool.
Question 5: Do you have any tips for test - driven development (TDD)? Claude often gets stuck in tests.
Erik: TDD is extremely useful in Vibe Coding. Even if you can't understand the test cases, it can help Claude become more self - consistent.
However, Claude does tend to write "dead - end tests" that are overly dependent on specific implementations. My approach is to enforce a specification: "Write only 3 end - to - end (E2E) tests, covering the happy path, error scenario 1, and error scenario 2." Guide it to write extremely minimalist end - to - end tests to ensure that even I can understand them.
When doing Vibe Coding, the only code I usually look at is the test code. Only when the tests pass do I feel confident.
Question 6: Andre Karpathy said "Embrace exponential growth". What does this exactly mean? Will the model get better in every dimension we expect?
Erik: The core of exponential growth is not just continuous improvement, but that the speed of their improvement far exceeds our imagination.