首页文章详情

One weekend + $1100 to finish the work of 5 people in 6 months: Cloudflare uses AI to "replicate" Next.js and has put it into production.

极客邦科技InfoQ2026-04-02 17:14
Development is shifting from "writing code" to "managing AI to write code".

In 2026, when AI Coding was advancing by leaps and bounds, a question that originally sounded almost absurd suddenly became a real one: If engineers no longer write code line by line, can complex frameworks be "redone"?

Steve Faulkner, the engineering lead of Cloudflare Workers, gave a rather radical answer. With the help of AI, he "replicated" the entire Next.js in a weekend and migrated it to Vite, creating Vinext. The token cost of the entire project was only about $1,100, but the results were quite astonishing: It can already serve as a plug-and-play alternative to Next.js and can be deployed to Cloudflare Workers with a single command. In preliminary benchmark tests, the build speed of production environment applications increased by up to 4 times, and the client-side package size decreased by up to 57%. More importantly, it has been officially put into production by customers.

This is why Vinext quickly set off a storm in the developer community. What really shocks people is not just "how much code AI has written", but that it is starting to approach a task that was previously assumed to be achievable only by a senior engineering team with long-term investment: refactoring a mainstream front-end framework with millions of users. More subtly, this project is not targeting a marginal toy, but a complex system like Next.js, which has long been deeply tied to Node.js, Vercel, and customized build chains. In other words, this is not just a display of AI Coding skills, but an attempt to answer a more practical question: When existing frameworks become increasingly awkward in cross-runtime and cross-platform deployments, can AI directly "rewrite" them?

Recently, Steve Faulkner detailed the ins and outs of this slop fork project in a podcast with hosts Wes Bos and Scott Tolinski. They also had an in-depth discussion around AI coding workflows, Agent browsers, code quality, test-driven development, and what software tools in the AI-first era should look like. Based on the podcast video, InfoQ organized and partially edited the content.

The core points are as follows:

  • Humans still need to be responsible for setting the direction, and AI is just a tool for execution and acceleration.
  • The goal is not to write "elegant code", but to achieve compatibility, pass tests, and verify whether this path is feasible.
  • An ideal AI-native language may combine the constraint capabilities of Rust with the simplicity of Go.
  • The development experience of an Agent is different from that of humans. It doesn't need a beautiful interface, but it must have a clear structure so that it can understand the operation path. This "Agent-oriented DX" will become an important direction in the future.
  • Medicine is likely to be the next key industry, and its development path may be similar to that in the programming field: AI can handle a large amount of basic work, but experienced doctors are still needed for decision-making and guidance.

"slop fork"

Wes: Please briefly introduce yourself and your work.

Steve: I'm currently the engineering director of Cloudflare Workers, responsible for all Workers-related businesses, including agent products, containers, and the Wrangler CLI. Our team has about 80 members. I've been at Cloudflare for several years. It's important to clarify that I don't write code in my daily work. After people saw this project and my blog, many called me a "100x engineer", but I think a more accurate term would be a "100x engineering manager".

Scott: In the current stage of AI development, is this becoming a trend? Are these "100x engineering managers" the ones with real "superpowers"?

Steve: Absolutely. I believe AI is essentially an amplifier. If you know what you're doing, it can help you complete tasks faster and better. But if the direction is wrong, it will also magnify that mistake. So, humans still need to set the direction, and AI is just a tool for execution and acceleration.

Scott: Recently, people have been discussing the term "slop fork" because the code was written by AI. What's your take on this?

Steve: I find it quite interesting and have accepted it. In fact, I now say things like "I'm going to slop fork something". Some people even joked, "We should slop fork Kubernetes and rewrite it in Rust." I take the emergence of new terms like "Vibe Coding" or "Clanker" in a light - hearted way and don't feel offended. (Note: "slop fork" can be literally translated as "garbage fork", but here it has a self - deprecating and internet meme flavor, punningly expressing the act of "forking" and rewriting an existing project in a "sloppy" way with AI.)

Wes: Why did you fork Next.js and make it run on Vite?

Steve: A year ago, we were thinking about how to better support Next.js on Cloudflare. Next.js does have some issues in hosting, especially in non - Vercel or non - Node environments. Some features are highly dependent on Node and Vercel, so although it can theoretically be deployed in many places, there are compatibility issues in edge cases.

At that time, we considered implementing our own compiler compatible with the Next API. But after evaluation, we found that it would require about 5 engineers working for 6 months, which was too costly and unrealistic. So we turned to the OpenNext project and are still investing in it. PS: If you need a stable, production - verified solution, you should use OpenNext first. Later, we also tried to have an intern implement the pages router, but it didn't work.

The real turning point came from December last year to January this year when the model's capabilities suddenly improved significantly. At that time, I mainly used AI for management - related work, such as summarizing meeting minutes, tracking Jira, and aggregating internal information. I gradually realized that these models were powerful enough, so I started trying to write some code projects. I noticed that Next.js has a very comprehensive test system, so I thought: Can we directly use tests to drive the implementation? So I started this project on a Friday afternoon.

I spent a few hours on planning first and then interacted with the model repeatedly. The next morning, when I tested the app router demo, I found that it could actually run. Although it wasn't perfect, it was enough to show that this path was feasible.

Wes: If you were to start from scratch to implement Next.js on Vite, how would you plan it? How much does this process depend on your understanding of software engineering?

Steve: I do have an advantage because I'm familiar with Next.js, and our team also uses Vite in other frameworks, so I know the overall architecture. It took me a few hours to formulate the initial plan and I kept iterating with the model through OpenCode.

I used a voice - to - text tool a lot for "dumping my thoughts". I didn't rely on complex prompt techniques but kept correcting the model's output. For example, I clearly pointed out that some suggestions were out of the project scope, like removing React. This process was more like a continuous collaboration between humans and AI rather than a one - time instruction.

Scott: In the planning stage, did you mainly use Markdown to organize information? Are there any particularly effective methods?

Steve: I used Markdown for everything. Currently, it's the most effective tool, although I think it's just a stage - optimal solution. In the next two to three years, we may see a more native way of working with LLMs.

I maintained a main plan document and a special document for testing. Next.js has a very large test set (about 8,000 tests), and many of them are not features I need to support in the first stage. So, I spent a lot of time screening and guiding the model to select which tests to use. A key breakthrough was that instead of trying to run the original test suite directly, I asked the model to "migrate" the tests one by one. This meant migrating the tests to my own test environment and gradually implementing the corresponding functions, while using the document to track the progress of each test.

Wes: Does the so - called "migrating tests" mean migrating to Vitest or also implementing the corresponding functions?

Steve: It's both. On the one hand, migrate the tests to Vitest and Playwright, and on the other hand, implement the corresponding functional logic.

Wes: Is this process a continuous interaction or can it run automatically for a long time?

Steve: I had OpenCode analyze the whole process. The results showed that my peak token usage was at 3 am, but I was definitely sleeping at that time, which means I did arrange a lot of tasks at night. My approach was not to write complex automatic loops, but to give it a task document, like "complete these 10 things", and then let it keep executing. It got stuck occasionally, but overall, it performed quite well.

The analysis also showed that my work pattern was "dumbbell - shaped": either short operations of a few minutes or deep work for one to two hours. This is consistent with my actual rhythm - I have two kids, and I do development in the gaps of my life. For example, after taking the kids to the park, I come home and quickly get back to the computer, give the model a nudge, and then go back to accompany the kids.

Finding a Reliable AI Workflow

Scott: How did you collect the data you mentioned earlier?

Steve: It all came from the session data of OpenCode. It stores all the information in SQLite, and I just asked the model to analyze this data.

Scott: Which model did you use?

Steve: I mainly used Opus 4.5 and 4.6, which generated about 99% of the code. Later, I started doing more code reviews and sometimes used Codex as an auxiliary.

Scott: Do you think there is a big difference between different models?

Steve: Many people say "Opus writes code, Codex does reviews". I also did this at first, but later I found the difference wasn't as big as I thought. In many cases, having the same model review itself is enough. I even let it enter a cycle: review the code, fix the problems, and then review itself again, iterating two or three times until there are no obvious problems.

Wes: What was your actual configuration of OpenCode? Did you use plugins, Agents, or MCP? Did you go crazy adjusting parameters like those who do it all day?

Steve: I'm one of those "parameter - adjusting people". I recently started playing with pi and adjusted a lot. But the overall configuration of this project was very simple. I mainly used the desktop app and VS Code, rarely used the terminal interface, and didn't use much MCP or complex agents. However, we do have an agent for Vinext to handle some review work in the repository. We found that giving the agent rich context makes it more useful. The MD file of that agent was even generated by itself at the start of the project. During the process, I would tell it: Remember to update agent.md to make sure it has everything needed.

There are two MCP services that are better to use than not: one is Context7, which provides an open - source library index, and the other is Exa Search. They bring about a 20% improvement in experience, but it's not a "qualitative change" - level improvement.

Wes: Does AI automatically operate the browser during the testing process?

Steve: Yes. I mentioned a tool in my blog - the Agent Browser, which is essentially a wrapper for Playwright and provides a very useful CLI interface. I used it a lot in this project.

I would let it operate two environments simultaneously: one is the app router playground in the production environment, and the other is the Vinext implementation. Then I would give it instructions to reproduce problems, compare behaviors, and locate differences. This was very helpful during the debugging process. For example, once I said "the scrolling is not smooth enough", which is actually a very vague description, but the model was able to identify the problem by itself and give a solution, which really shocked me.

Scott: When I used the Agent Browser, I encountered a problem: the Opus model often couldn't handle screenshots, saying "the screenshot is too large", and then the whole session would crash. Did you have this problem?

Steve: Yes, and it was quite serious. In OpenCode, this situation would directly contaminate the entire session, and I had to start over. The problem is that some sessions are very valuable, so sometimes I would ask the model to compress the current context into a markdown file for later recovery or reuse.

Scott: Do you closely monitor the context? For example, use sub - agents to manage it?

Steve: I didn't do this in a very systematic way, and it's not perfect. Sometimes after compressing the context, the model would "go off - track" and needed to be re - guided. However, I noticed that OpenCode has made significant improvements in this regard recently.

In addition, I also maintained a file called discoveries.md to record various problems found during the process, such as compatibility issues between certain React or Webpack versions and Vite. Whenever I encountered a problem, I would record it so that the model could continue to make progress based on these "known conclusions" instead of repeatedly making the same mistakes.

Wes: I recently encountered a similar problem in a project: the model kept repeating the same error, such as wrongly introducing server - side code into the client - side module, and then got stuck in a loop of fixing. I finally had to write the solution in agents.md or an external document to force - constrain its behavior.

Steve: Based on this phenomenon, one important thing I've learned is that agents are extremely responsive to feedback. In contrast, humans are not good at quickly absorbing and iterating on feedback. If you tell a person "this is wrong, rewrite it", the effect may not be obvious, but for a model, after providing new context, it can often improve significantly.

Many people, when they first start using AI, will reject it because the first result is not good. But in fact, as long as you iterate a few more times, by the fourth or fifth time, it can usually get it right. This "ability to quickly correct mistakes" is the key.

Scott: Indeed, some people think the tool is useless after just one try.

Steve: This is because of programmers' thinking habits. Traditional programs are deterministic. If the code is wrong, it will be wrong every time it runs. But LLMs are in a "non - deterministic" middle ground, and this uncertainty is actually a feature. It may produce a very bad output the first time, but you can correct it, and it won't make the same mistake again. Of course, this also means there are risks. For example, it may generate wrong Terraform configurations and even damage the production environment. But if you correct it in time, it probably won't make the same mistake again. I'm not an extreme optimist about AI. I'm both excited about its potential and worried about the risks.

Wes: How is the overall quality of the code generated by AI? Are there obvious cases of "going off - track"?

Steve: Of course, there are. Every time I look at the code, I'm not very satisfied. The code is usually quite verbose and not in the style I would write. This project made me realize that the goal is not to write "elegant code", but to achieve compatibility, pass tests, and verify whether this path is feasible. This is an experiment, and the core is to explore the boundaries of AI, not to pursue perfect engineering practices. If code quality becomes a problem later, we can optimize it.

For example, part of the Vinext code is currently generated through template strings, which means the code is "stitched together" without type checking or linting, and can only be verified through end - to - end tests. I actually don't like this method, and it's not conducive to maintenance. So now we're gradually refactoring, splitting out this generated code into a normal code structure that can be type - checked and linted. This is also a process of recycling from "AI - generated" to "engineered".

Scott: When I was building an AI workflow recently, I designed multiple processing stages for each function, such as linting, styling, UI, and accessibility, but it felt very costly.

Steve: This is exactly why I think "guardrails" are important. Testing, linting, and formatting are all necessary constraints, but at the same time, we can't completely restrict the model. The ideal way is to break tasks into small pieces most of the time and add clear constraints, but also allow the model to "freely play" at certain times, such as letting it redesign a module and come up with different ideas.

Scott: I also regularly ask the model to conduct an audit analysis and get some optimization points that I haven't considered myself.

Wes: How can the security of a system written by