Worked non - stop for 7 days on GPT - 5.2 and created a Chrome - level browser with 3 million lines of code.
[Introduction] How long can a large model continuously write code? One hour? One day? Or, like most AI programming tools, end the conversation after completing a single task? Michael Truell, the CEO of Cursor, decided to conduct an extreme stress test!
Michael Truell let GPT - 5.2 in Cursor run continuously for a full week.
Not one hour, not one day, but non - stop, around the clock, writing code for 168 hours straight.
What was the result?
Three million lines of code. Thousands of files.
The AI completely built a brand - new browser from scratch.
Moreover, it's a browser like Chrome.
HTML parsing, CSS layout, text rendering, and a self - developed JavaScript virtual machine - all were written by the AI itself.
Michael Truell casually posted a tweet: It basically runs! Simple web pages can be rendered quickly and correctly.
How long can a model actually run?
Traditional AI programming tools, such as Github Copilot and other early IDEs, operate in a question - and - answer mode.
The length of the conversation is limited, the context is limited, and the complexity of tasks is limited.
Later, so - called Agentic programming emerged - tools like Claude Code, Cursor Agent, and Windsurf allow AI to autonomously execute multi - step tasks, read files, run commands, and fix errors.
This is already a significant improvement, but in most cases, tasks are still measured in minutes, at most a few hours.
The AI completes a function, humans review it, and then the next task begins.
But no one had ever tried to let a model run continuously for a week.
Until GPT - 5.2.
The Cursor team let GPT - 5.2 run continuously for a full week, not intermittently, but working continuously.
During this week, it:
- Wrote over three million lines of code
- Created thousands of files
- Executed trillions of tokens
- Built a complete browser rendering engine from scratch
How long can a model actually run?
The answer is: Theoretically, indefinitely.
As long as the infrastructure is stable and the task is clear enough, the AI can work continuously - non - stop, without breaks, 24/7 all year round.
It's like the "cyber cheap labor" of the Australian sheep - herding uncle.
But in reality, the "endurance" of different models varies greatly.
The context window is the first threshold.
Early GPT - 3.5 only had a 4K token context, which meant it would forget things when the conversation got a bit long.
Claude 3 introduced a 200K context, GPT - 4 Turbo followed with 128K, and Gemini 1.5 Pro even claims to support one million tokens.
But the context length is just a theoretical value - what's really tested is whether the model can maintain consistency, focus, and execution ability in long - term tasks.
The Cursor team discovered key differences in the experiment.
In Cursor's official blog, the team found key differences in the experiment:
- GPT - 5.2 can work autonomously for a long time, follow instructions precisely, and stay focused without deviation;
- Claude Opus 4.5 tends to end tasks early, take shortcuts, and frequently hand control back to the user;
- GPT - 5.1 - Codex, although trained specifically for coding, has weaker planning ability than GPT - 5.2, so it's prone to interruptions.
To put it more bluntly: Opus is like an impatient intern, asking "Is this okay? I'll submit it now" after working for a while;
While GPT - 5.2 is like a seasoned senior engineer, who just buries their head in work until the task is completed after being given clear instructions.
This is why Cursor officially claims that GPT - 5.2 is a cutting - edge model for long - running tasks.
It's not just about browsers.
Cursor also revealed other ongoing experimental projects: JavaLSP, a Windows 7 emulator, and an Excel clone.
The data is astonishing. The AI wrote 550,000 lines, 1.2 million lines, and 1.6 million lines of code non - stop on its own. (By the way, there are more lines of Excel code than Windows code. Interesting!)
Multi - agent system collaboration
A model wrote three million lines of code in a week, note that it was non - stop writing without human intervention!
Obviously, this wasn't a model "working alone". How was it done?
The Cursor team revealed their secret weapon: Multi - Agent System.
Initially, they tried to let all agents collaborate equally and synchronize their states by sharing files. The result was:
Agents would hold locks for too long or simply forget to release them. The speed of twenty agents dropped to the effective throughput of two or three agents.
This is very similar to common problems in human teams: too many meetings, high communication costs, and unclear responsibility boundaries.
The ultimately effective solution was a hierarchical architecture:
- Planners: Continuously explore the codebase, create tasks, and make high - level decisions
- Workers: Focus on completing specific tasks, don't care about the overall situation, and move on to the next task after submission
- Review Agents: Determine whether each iteration is qualified and decide whether to enter the next stage
This is almost the organizational structure of a human software company: product managers/architects are responsible for planning, programmers for execution, and QA for review.
But the difference is that hundreds or thousands of agents work simultaneously.
The Cursor team achieved the goal that hundreds of agents could collaborate on the same codebase for weeks with almost no code conflicts.
This means that AI has learned the collaborative tacit understanding that human teams need years to develop.
The "moat" of browsers is much deeper than you think
If they hear the comment "It's just software for displaying web pages", all engineers who have worked on browser kernels will probably smile wryly.
In the hierarchy of computer science, the difficulty of writing a browser kernel by hand is second only to writing an operating system by hand.
To give you an idea of these three million lines of code, let's take a look at Google's Chromium (the open - source parent of Chrome).
As one of the pinnacles of human software engineering, the code volume of Chromium has long exceeded 35 million lines.
It's not just a piece of software; in essence, it's already an "operating system disguised as an application".
What exactly was GPT - 5.2 challenging?
First, there's the "chaos theory" of CSS.
Web page layout has never been as simple as stacking building blocks.
The CSS standard is full of various historical quirks, cascade rules, and complex inheritance logic.
A former Firefox browser engineer once made an analogy: Implementing a perfect CSS engine is like simulating a universe where physical laws change at will. If you change the properties of a parent element, the layout of thousands of child elements may collapse instantly.
Second, there's the "virtual machine within a virtual machine".
This time, the AI not only wrote the interface but also a JS virtual machine.
JavaScript code running on modern web pages requires memory management, garbage collection (GC), and a security sandbox.
If not handled properly, web pages can consume all your memory or even allow hackers to take over your computer through the browser.
Most importantly, it chose Rust.
The Rust language is known for its "uncompromising security", and its compiler is like an extremely neurotic examiner.
Human engineers often spend half of their time "arguing" with the compiler when writing business logic, dealing with borrow checkers and lifetime issues.
The AI not only has to understand the business but also ensure that the "examiner" can find no faults in millions of lines of code.
Being able to tackle these tough problems in seven days and make them work together doesn't just mean "writing fast"; it means the machine has begun to have top - level architectural control.
When AI can "endure loneliness"
But the real bombshell in this news isn't the browser itself, but the "Uninterrupted".
This is a watershed in AI evolution.
Before this, the AI programming tools we were familiar with (such as early Copilot) worked like this: You write a function header, and it completes five lines of code; you issue a command, and it generates a script.
Their memory is fragmented, and their attention is short - lived.
Once the task gets a bit more complex, like "refactoring this module", they often can't handle it comprehensively, causing problems in one part while fixing another, and ultimately, humans have to clean up the mess.
But this time is different. This is a victory for "long - term tasks".
These three million lines of code are distributed across thousands of files.
When the AI writes the three - millionth line, it must still "remember" the architectural rules set in the first line;
When the rendering engine and the JS virtual machine conflict, it must be able to trace back tens of thousands of lines of code to find the source of the bug.
During these 168 hours, GPT - 5.2 must have written bugs.
But it didn't stop to report errors and wait for humans to provide answers. Instead, it read the error logs, debugged on its own, refactored on its own, and then continued.
This autonomous closed - loop of "writing - running - fixing" used to be the most proud moat of human engineers.
Now, this moat has been filled.
We are witnessing the qualitative change of AI from a "chat companion" to a "digital laborer".
Previously, we commanded AI to do "tasks", like "writing a Snake game";
Now, we command AI to do "projects", like "building a browser".
The spiral of silence
Although this AI - powered browser is still far from being as mature as Chrome, it proves the feasibility of the path.
When computing power can be transformed into extremely complex engineering implementation capabilities, the marginal cost of software development will approach zero.
What's most shocking about this experiment isn't the rendered web page on the screen, but the progress bar that silently ran in the background for a full seven days.
It worked non - stop, patiently, building the foundation of the digital world at a speed of thousands of characters per second.
Maybe we should re - examine the definition of "creation".
Only when a tool starts to solve problems alone at night do we realize that it is no longer just a tool but our fellow traveler.
From the Australian uncle's "cyber cheap labor" to AI long - term tasks
The Australian sheep - herding uncle who drove Silicon Valley crazy with five lines of code actually did only one thing: making the AI not stop until it reached the goal.