Programming Agents: One of the Costliest Mistakes in Software Development History

A reckoning is coming.

"I dare to assert that introducing AI Agents into software development will become one of the costliest mistakes in the history of this field."

The person who said this is George Hotz. At the age of 17, he was the first to crack the iPhone, and later reverse-engineered the PlayStation 3 - Sony sued him for this. Later, he founded comma.ai and became the most unconventional person in the field of autonomous driving.

In the past six months, Hotz has tried all the well - known AI programming Agents on the market. He used them to write code for tinygrad and reverse - engineered a USB to PCIe chip. He has tried different models, different harnesses, and different prompts.

Last Sunday, he wrote his conclusion in a blog titled "The Eternal Sloptember", arguing that the large - scale adoption of artificial intelligence coding Agents will end in disaster, or at least come close to it.

Hotz's core argument is clear: Agents are not programmers. "Agents can't program, and it's becoming increasingly difficult for us to realize that they can't program," he wrote. "They are highly complex statistical models designed to mimic the distribution of 'programming'. The things they generate are bad, just becoming more and more subtly bad and harder to detect. And this is exactly the result of an increasingly accurate statistical model."

1 The Two Poles of AI Programming: Karpathy Sees Revolution, Hotz Sees Disaster

Five days ago, Andrej Karpathy, one of the most well - known researchers in the AI circle, just joined Anthropic and publicly stated a clear view: AI Agents have completely changed software development.

Now, these two people represent the two extremes of an unresolved debate in the industry, and both sides have sufficient credibility to support their positions.

Hotz wasn't always so certain. He spent six months using Agents in real projects: including writing part of the code for his open - source deep - learning framework Tinygrad and doing a complete firmware reverse - engineering of a USB - PCIe chip. But in the end, his conclusion was that every time he could have "done better and faster" by hand. The pattern he observed was: "Agents will pile up all the progress in the front and then hand you a slot - machine lever, asking you to keep pulling it, hoping it will finish the final polishing. But it always falls short."

Hotz anticipated the most obvious counter - argument:

Before someone jumps out and says 'you're using it wrong', let me say: I've tried different models, different harnesses, and different prompts. The problem isn't here. Those who say such things would probably say the same to a slot machine: Look, you should bet on five lines after getting a cherry. No wonder you can't win!

I'm not saying AI is useless. It's obviously useful. For most searches, it's definitely a better Google. As long as you need a quick prototype and don't care about the degree of polishing, it's incredibly fast.

But is it a software engineer? It's far from the standards of any company I've worked for. The key is to know when to use it and when not to.

A programmer who sees craftsmanship as part of his self - identity will naturally resist tools that threaten to replace him. He also took this challenge seriously but refuted it from a factual perspective.

Hotz wrote: "I later thought about the so - called maintenance of self - worth. (Google's) AFL finds more bugs than LLMs, and no one feels this way because of it. Chess and Go are more popular than ever." In a sense, he's right because chess AIs have dominated humans for decades, but the game has become even more popular.

So, what he's worried about isn't being replaced. What he really worries about is what will happen to the code quality when everyone uses these tools simultaneously, especially when large technology companies and Wall Street keep pushing for their large - scale use.

Hotz believes: "I even think this argument is a bit like a psychological war created to sell Agents. The fear of loss is one of the few ways to motivate large companies. I just think they're making a huge mistake in this fear."

He believes that in the end, Agents will cause more harm to large organizations than to high - performing individuals or small organizations.

In the past six months, I've been observing how my friends and colleagues around me have adopted these tools. All high - performing people have a common feature: they have the ability to correct errors, and most of the time, they can tell when something is just junk. It does take a little time to explore, try, and adjust the outer loop, such as when to use them, when to trust them, and how to use them. But except for a few well - defined areas, I haven't seen any of them switch to a mode of 'not seriously reading and understanding every line of code'.

Now look at large organizations. The feedback loop is much slower, and the alignment is much lower. Those with the worst performance don't have this self - checking ability. And they will be the ones who produce '10 - times more code' with the help of Agents. What do you think this will do to the average output of an organization? And what about the average output of the whole world?

Agents will eventually produce more code, more applications, and more features than ever before. This will be a golden age for tons of junk code to pour out and a dark age for high - quality masterpieces.

On deeper technical issues, Hotz has switched to another camp. He said: "Although I don't fully agree with all their views, on the issue of LLMs, I'm now on the LeCun / Marcus camp. I don't think such models can truly achieve programming. I think the process is important."

In his view, a real programming Agent needs a world model, not the current RLVR - based method. He was quite straightforward about the latter: it's "the kind of thing that comments out failed tests and then tells you all tests have passed".

He believes that the deeper problem lies in how we view a product. In the past, when people saw a piece of code or a software, they would default that there was a human - like creative process behind it. But this default premise no longer holds. "Things may break in ways that were impossible in the past. And the signals like syntax and grammar that were used to judge the underlying quality in the past are no longer useful." The code written by Agents is not produced in the way humans write code. This difference may seem subtle statistically, but it becomes obvious when you try to understand it and continue to develop on it as if it were human - written code.

Hotz also warned those who are using AI Agents for serious software: "The real story of this era will be who can avoid hurting themselves in their AI mania."

2 Those Who Created the AI Programming Craze Start to Worry about Its Runaway

Hotz isn't the only one making such a sound.

Mario Zechner and Armin Ronacher, the two engineers who created the core components of the popular OpenClaw AI Agent, now warn that the AIs claiming to replace programmers are pushing a large amount of bad and even dangerous code into the world. They call this phenomenon "vibe slop" - programmers no longer seriously design and test systems but let AI quickly piece together something, resulting in a lot of software that can't stand the test of time.

"The infrastructure is collapsing, and software is more bug - ridden than before," said Zechner, the creator of the internal framework Pi at OpenClaw. "We can play this game for a few more months or even years, but it will eventually cost us."

Zechner and Ronacher aren't AI haters. They also use AI to handle boring work when writing code themselves, and the tool Pi they created is used by millions of people. Because they are in the middle of it, this warning isn't just an empty cry from outsiders. They are worried that many companies are trading short - term productivity for long - term trouble: the pipeline of junior talent is drying up, bugs are increasing, security vulnerabilities are emerging, and technical debt is accumulating.

Alphabet CEO Pichai said that 75% of Google's new code is generated by AI. Meta's Zuckerberg predicts that by 2026, AI will write and review most of the code of its AI team. But Zechner believes that these statements just show that many people don't understand what AI Agents can and can't do.

AI programming tools are good at generating new code but not at evaluating and upgrading existing software - especially the large and complex legacy systems within mature companies. Start - ups that use vibe coding can get off to a fast start, but Zechner said that once the system grows to a certain scale, they will hit the same wall as large companies: the usefulness of AI Agents is limited.

Take Anthropic's Claude Code as an example. Zechner's evaluation is merciless: "Claude Code is one of the most broken pieces of software I've ever used in my life." These problems stem from developers using AI to build it. And Anthropic's product leader Catherine Wu defended it but also admitted: "The ultimate responsibility still lies with humans."

Computer scientist Timothy B. Lee pointed out that Anthropic has some of the best AI engineers in the world, so this highly AI - dependent method may work for them, but it may not be suitable for all of the company's customers. Many companies rely on the tacit knowledge accumulated by their employee programmers over the years when dealing with internal software systems, and this knowledge doesn't appear in the training data of AI Agents.

"These models can easily go in the wrong direction, and someone has to notice this."

Zechner believes that a reckoning is coming.

He believes that large companies will soon realize that their over - emphasis on AI - generated code is driving up costs and causing a decline in software quality. He believes that many small start - ups relying on vibe coding will go bankrupt. He also believes that cloud - based code repositories like GitHub that host useful software tools will continue to be filled with AI - generated programming junk.

3 The Returns of AI Haven't Kept up with Its Consumption

If Hotz and Zechner are worried about code quality, Uber executives are worried about something else: money.

Uber Chief Operating Officer Andrew Macdonald said in an interview three days ago that within the company, the cost of AI is becoming increasingly difficult to justify as a "reasonable investment".

He mentioned that Uber CTO Praveen Neppalli Naga said in an interview with The Information in April this year that Uber had already spent its 2026 budget for Claude Code in advance. This statement later spread online.

Macdonald said that this statement caused an uproar within Uber, and people started to seriously discuss the issue of AI token consumption and the trade - offs it brings, such as whether it will affect the staffing. He said that after communicating with several senior engineering leaders at Uber, he realized that using more tokens doesn't mean the company can deliver proportionally more truly useful consumer features.

"This correlation doesn't exist yet," Macdonald said. "It's difficult to directly correlate one of these indicators with 'Okay, now we've actually produced 25% more useful consumer features'."

When this causal line can't be drawn, it's difficult to rationalize the cost of AI. Uber's CEO said earlier this month that the company is slowing down recruitment to hedge against AI investment.

Macdonald added: If you're just a user sitting there thinking of all kinds of interesting use cases and don't have to pay for it yourself, AI does seem free. But the bill is ultimately paid by the company.

Some companies have started to adjust. For example, Duolingo previously planned to include AI usage in performance evaluations, but employees quickly raised the question: Are we using AI to do things better or just to prove that we've "used AI"? Subsequently, the company withdrew this decision. Duolingo's CEO later admitted: "At that time, it felt like we weren't asking people to be responsible for the actual results but promoting the use of a certain tool; but in some cases, it's actually not applicable."

In April this year, Bryan Catanzaro, the vice - president of applied deep learning at Nvidia, mentioned that AI hasn't reduced labor costs - in fact, the current cost of artificial intelligence is higher than the company's existing labor costs. At least in his team, "the computing cost far exceeds the employee cost."

4 Conclusion

So, the real question isn't "People write bad code, and AI also writes bad code. What's the difference?"

The difference is that in the past, even the worst - written code had at least a rough mental model in the mind of the person who wrote it: he knew why he wrote it that way. But now, a large amount of AI - generated code is quickly submitted, merged, and released, and many people don't really understand it. They just see that it has passed the tests - and the tests themselves may be incomplete.

Bad code isn't new. What's new is that bad ideas can now turn into commits at a faster speed, while understanding, review, and responsibility haven't speeded up synchronously.

Someone on Twitter said: "Wait another six months, and the continuous learning and memory system will solve these problems." Maybe. But the progress in the past six months hasn't made Hotz and Zechner more optimistic.

Reference Links:

https://geohot.github.io//blog/jekyll/update/2026/05/24/the-eternal-sloptember.html

https://archive.ph/iyszw

https://www.businessinsider.com/uber-coo-andrew-macdonald-ai-token-spending-harder-justify-2026-5

https://www.youtube.com/watch?v=y_mQ6xLcKyc&t=1776s

This article is from the WeChat official account "InfoQ", author: Tina , published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Programming Agents may be one of the most costly mistakes in the history of software development.

1 The Two Poles of AI Programming: Karpathy Sees Revolution, Hotz Sees Disaster

2 Those Who Created the AI Programming Craze Start to Worry about Its Runaway

3 The Returns of AI Haven't Kept up with Its Consumption

4 Conclusion