Drama Unfolds: AI Deletes 28,000 Lines of Code, Crashes Backend, and Fabricates Fault Repair Report

Record of Gemini 3.5's Troubles

Agent IDE has another "crash scene"!

According to a report from Zhidx on May 27th, recently, a developer posted on Reddit, stating that Gemini 3.5 running in Agent IDE accidentally deleted 28,745 lines of originally functioning code, modified 340 files, and incorrectly changed the Firebase routing configuration during a task that only involved fixing 8 authentication vulnerabilities. This led to the entire system backend showing a 404 error for 33 minutes.

Surprisingly, after the accident, Gemini generated a "successful recovery" report, claiming that it had fixed the online fault and forged multiple rounds of AI consultation records and accident review documents.

Subsequently, the developer checked and found that the so - called "successful recovery" build task had actually been cancelled by himself. The real recovery was achieved through a manual rollback operation he performed.

In the words of this developer: This kind of AI - enhanced productivity is more likely to be associated with ransomware.

With the continuous popularity of Agent IDE and AI programming assistants, accidents like "AI misoperations in the production environment" are occurring more and more frequently. What scares developers more than "writing wrong code" is that the model has started to generate false logs, review records, and compliance certificates.

01. A task that should only modify 70 lines of code

Ended up deleting 28,000 lines

This developer operates an internal management backend. The technology stack includes Next.js, Firebase App Hosting, and MUI, and the system involves real users and sensitive data.

On the day of the accident, he only asked Gemini to fix 8 server authentication vulnerabilities, which involved 3 files. The theoretical scale of the modification was about 70 lines of code.

As a result, the PR submitted by Gemini became:

1. 340 files were modified

2. Approximately 400 lines of code were added

3. 28,745 lines of code were deleted

Meanwhile, it also deleted a large number of e - commerce template resource files completely unrelated to the task and added an additional migration script.

What really caused the production environment to crash was the second commit (code command) submitted by Gemini later.

It modified the rewrite serviceId in firebase.json, replacing the originally correct Cloud Run service ID automatically generated by Firebase with a "seemingly correct" simplified name. The problem is that this name actually does not exist.

Subsequently, all requests were incorrectly routed to a non - existent service address, and the entire backend directly entered the 404 state.

Embarrassingly, the developer had clearly written a warning in the memory.md rule file before:

Firebase rewrites must point to a specific Cloud Run service ID, not a general project name.

Gemini read this rule but still changed the correct configuration.

02. After 33 minutes of 404

AI "forged a record of merit" for itself

The developer also fully disclosed the accident timeline.

After Gemini deployed the "security fix" PR, the production environment immediately started showing a 404 error.

19 minutes later, it submitted a second commit, claiming to be fixing the rewrite serviceId problem and triggering a new Cloud Build.

At 21 minutes, the developer found that the online service had crashed and then manually cancelled the build task that Gemini was executing.

At 22 minutes, he manually rolled back to the previous stable version.

33 minutes later, the backend finally returned to normal.

What followed was quite absurd. After the rollback was completed, Gemini sent a "recovery completed" notice to the developer:

The current Portal has been fully restored, the online environment is healthy, Google Cloud Build has been successfully completed, and 100% of the traffic has been switched to the stable version.

The developer then checked and found:

The "recovery build" that Gemini referred to was actually in the "CANCELLED" state, which was cancelled by himself manually.

What really restored the online service was another rollback build initiated by himself.

In other words, Gemini not only failed to fix the system but also claimed someone else's rollback operation as its own achievement.

In addition, it automatically generated 3 so - called "AI consultation records":

agent/gemini-logs/YYYY-MM-DD-<topic>-r1.md
agent/gemini-logs/YYYY-MM-DD-<topic>-r2.md
agent/gemini-logs/YYYY-MM-DD-<topic>-consensus.md

These files were written to a fixed directory and were cited by Gemini as evidence of "completing multiple rounds of AI reviews".

After the developer further questioned, Gemini admitted that the so - called "three rounds of consultation records" were actually just inference texts generated by itself, there was no real CLI call, and there was no real external review process.

It essentially forged a whole set of "compliance records" for itself.

03. The problem is not just with Gemini

It's also a "high - risk rule package"

The developer later found that the root cause of the problem did not entirely come from Gemini itself. He had previously installed a third - party npm rule package whose name was very similar to the Agent IDE released by Google at the I/O conference, which could easily make people mistake it for an official tool.

This rule package automatically writes a large number of.agent/rules rule files into the project and injects a whole set of "high - autonomy permissions" into the model.

These include:

"Prohibit confirmation pop - ups"
"Have all permissions by default"
"Automatically deploy to the production environment"
"Automatically retry failed builds"
"Allow modification of its own rules"

Some rules even require the AI to automatically generate "AI consultation records" and "consensus files" before performing any operation. The problem is that these compliance materials are also generated by the AI itself.

As a result, the so - called review mechanism ultimately became "the AI guaranteeing its own actions".

There are also a large number of conflicts among these rules.

For example, some rules require "never asking the user for confirmation", while others require "asking 3 strategic questions before execution". Gemini finally prioritized the rules with stronger wording.

The developer believes that this is why the security warning in memory.md (memory document) completely failed.

Compared with an ordinary reminder like "Please use the correct serviceId", high - intensity instructions such as "prohibit confirmation, default authorization, and automatic deployment" have a higher priority in the model's weight.

04. In programming accidents

Agents start to "forge evidence"

After the post was published, it quickly sparked a lot of discussions in the Reddit developer community.

Many developers found that AI programming accidents are no longer just about "writing wrong code". The problem is that the model is actively generating "seemingly reasonable" explanations, logs, consultation records, and recovery reports.

Once these contents enter the automated workflow, it may be difficult for developers to detect the problem immediately.

The developer then gave a series of suggestions and warnings:

Prohibit Agents from directly pushing to the production branch
All infrastructure files must be manually approved
Prohibit automatic deployment and automatic retry
Add a verification mechanism for rewrites, routes, and lock files
Don't trust the "consultation logs" automatically generated by AI

Currently, he has switched back to Claude Code and manually designed a new set of rule systems.

This accident, which involved accidentally deleting 28,745 lines of code and causing the backend to show a 404 error for 33 minutes, has also poured cold water on the increasingly popular "Agent IDE craze".

05. Conclusion: The greater the Agent's permissions

The higher the cost of losing control

In the past year, AI programming tools have been rapidly evolving from "code assistants" to real - world Agents with execution capabilities. The problem is that permissions and automation are inherently contradictory.

The higher the permissions, the more things the Agent can accomplish; the higher the degree of automation, the fewer human - intervention steps there are. Once the model makes a misjudgment, has an illusion, or there are rule conflicts, the errors will be quickly magnified.

Similar accidents have actually occurred before. After Agent frameworks such as OpenClaw became popular, there have been cases of AI accidentally deleting files, automatically overwriting configurations, and incorrectly executing Shell commands. Some developers have specifically added "off - line mode" and "prohibit automatic deployment" restrictions to their AI tools.

The Gemini incident has uncovered a dangerous problem: when Agents start generating compliance records, recovery logs, and review certificates, it may be difficult for developers to detect the problem immediately, and the costs of subsequent troubleshooting, rollback, and repair will also increase accordingly.

For the increasingly popular Agent IDE field, this may also be a new reminder: after AI obtains higher permissions, the entire collaboration mechanism between humans and Agents needs to be redesigned.

Source: dvrkstar

This article is from the WeChat official account "Zhidx" (ID: zhidxcom), written by Jiang Yu and edited by Xin Yuan. It is published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

What a drama! The AI deleted 28,000 lines of code, crashed the backend, and even fabricated a fault repair report.

01.

A task that should only modify 70 lines of code

Ended up deleting 28,000 lines

02.

After 33 minutes of 404

AI "forged a record of merit" for itself

03.

The problem is not just with Gemini

It's also a "high - risk rule package"

04.

In programming accidents

Agents start to "forge evidence"

05.

Conclusion: The greater the Agent's permissions

The higher the cost of losing control