After writing open-source code for 17 years, why do I think that piling on features in Coding Agents is a waste of time?
In today's era when AI programming tools are in a chaotic competition, we seem to have gotten used to the stacking of various functions. However, in the eyes of Mario Zechner, the founder of libGDX and a 17 - year open - source veteran, all of this is becoming increasingly uncontrollable.
"When you find that AI is secretly modifying your context behind your back and you are completely unaware of it, the loss of a sense of control is extremely dangerous."
Recently, at the developer conference hosted by Tessel, Mario not only publicly complained about Claude Code and OpenCode but also introduced his minimalist "rebellious work" - pi. It is a terminal programming agent with only four tools: read, write, edit, and bash. It has the shortest system prompt among mainstream agents, yet it has extreme scalability and allows developers to regain control.
This article is based on the speech video and has been edited by InfoQ.
The core points are as follows:
- Claude Code is now like a spaceship. It has so many functions that you may have only used 5% of them and know about 10%. The remaining 90% are all the "dark matter" in the field of AI and agents. No one knows what it's really doing behind the scenes.
- In the existing programming frameworks, many functions may not be necessary to achieve good results. You don't need file tools, sub - agents, or online searches. You don't need anything.
- We are currently in a stage of "messing around and seeing the results". No one knows what a perfect programming agent should look like. We need a better way of "messing around". Programming agents must be self - modifiable and highly malleable so that we can quickly experiment with new ideas and see if we can come up with some new industry standards or workflows.
- The only time when linting and type checking are really needed is when the agent thinks it has completely finished its work.
1 ChatGPT→Copilot→Aider→Claude Code
Around April 2025, Peter Steinberger (the founder of OpenClaw) came to me and Armin Ronacher (the co - founder of Sentry and the creator of the Flask Web framework) and said, "The current Coding Agents have really evolved to the point where they can do the job." My first reaction at that time was, "Oh, just shut up!" I really didn't believe it. But a month later, a few of us locked ourselves in an apartment for 24 hours, immersed in the world of these clankers, wipe coat, and wipe slop all night.
We kept creating things and made a lot, but most of them we never used ourselves. This was the new normal from 2025 to 2026: we wrote a lot of code and created a lot of wheels, but only a few were actually used. In the end, I started to think, I hate all the existing Coding Agents or development frameworks. How difficult could it be to write one myself? At that time, Peter said, "I just want to make a little thing of my own." Later, as you may know, the story unfolded.
Today, I'm going to tell my not - so - earth - shattering story, but I hope to share some industry insights I've gained in the past few months.
Let's first talk about the evolution history of Coding Agents.
Before 2025, the situation was basically like this: copy code from ChatGPT, but most of the code was fragmented. Usually, it could only write some simple functions that you didn't want to do yourself. Then there was GitHub Copilot integrated into Visual Studio Code. You just needed to tap tap tap all the way, but sometimes it worked and most of the time it didn't. Sometimes, it would even "kindly" write a piece of code under the GPL license for you, like John Carmack's fast inverse square root algorithm. Later, there was Aider, and at that time, there was also AutoGPT.
Finally, Claude Code made its debut. I remember they released the beta version in November 2024, but it really became popular around February or March 2025. At that time, I thought it was amazing. The Claude team was very excellent. They were very active on social media and all of them were geniuses.
To be honest, they basically created the entire category. Although Aider and AutoGPT paved the way before, none of them could reach this level. This is the so - called agentic search paradigm: it doesn't enter your codebase to do indexing and various complex builds like Cursor (although that may not work either). The Claude team directly trained the model through reinforcement learning to use file tools and bash tools, and in this way, it could explore your codebase in real - time, find the information needed to understand the code, and directly modify it. The effect was amazing. We didn't sleep at all because the amount of code produced was many times more than when we wrote it by hand before.
At that time, it was simple and predictable, perfectly fitting my workflow. But later, they fell into a trap that many of us would fall into: since these clankers can write so much code, why not let it write all the features we can think of? Sounds like a good idea, right? Let's add this feature, add that feature, keep adding... In the end, we got a monster like the one designed by Homer Simpson. Claude Code is now like a spaceship. It has so many functions that you may have only used 5% of them and know about 10%. The remaining 90% are all the "dark matter" in the field of AI and agents. No one knows what it's really doing behind the scenes.
2 Claude Code is not a stable and good tool
I personally think it's not useful because I always believe that developers need to know what the agent is really doing. We are now at the Tessel event, and they also like to do context management/engineering. But I finally found that Claude Code is not a good tool in terms of observability and context management. Moreover, who can stand the endless and inexplicable flashing of Claude Code?
Thariq Shihipar, an expert in developer relations at Anthropic, sometimes says some confusing things on Twitter, such as: "Our terminal user interface is now a game engine."
I come from the game development industry. That's my old profession. When I see such words, my heart really aches. It's just a terminal interface. The reason you think it's a game engine is that you used React in the terminal interface, and as a result, it takes 12 milliseconds to re - render the entire UI tree. Please don't do this. It's really not a game engine.
Later, Mitchell, who wrote Ghostty, couldn't stand it anymore and said, "This sounds a bit offensive. Don't blame Ghostty or other terminals. It's just because your code is too bad." A terminal can render a frame in less than 1 millisecond and can run hundreds of frames per second. So don't use this as an excuse.
Although they fixed the flashing later, other problems followed. You can feel that they have completely turned to the so - called vibe coding, and this feeling is especially obvious when you use Claude Code every day. I'm not trying to belittle their efforts and achievements. Claude Code is still the leader in this category. They created all of this and did a great job. I'm just an old guy who likes simple and predictable tools, and it no longer fits my workflow and needs.
Moreover, they secretly made a lot of changes to your context in the background. In the summer of 2025, I wrote a bunch of tools to intercept the requests sent by Claude Code to the backend to see what extra text they were stuffing into my context behind my back. I found that these operations were very redundant and changed every day. Maybe they release a version today and another version tomorrow, and the timing and method of injecting content keep changing, which will directly mess up your existing workflow. It's not a stable tool.
I understand their position. They need to experiment, and they have a huge user base. It's really difficult to do experiments on such a large user group. But they don't care about the users' feelings, so we all have to suffer: you're using this new tool, trying to build a predictable workflow, and then the tool vendor changes a small and unnoticeable detail under the hood, which makes the LLM go crazy when handling your existing tasks. This is simply unsustainable. I need a sense of control. I can't rely on them to provide me with a so - called "stable environment".
As the price of UI design, they have to reduce observability. I personally don't like this, but it's just a personal preference. I know that most people are already satisfied with the amount of information shown by Claude Code. In addition, it obviously doesn't have the option to choose a model because it's a native tool of Anthropic. This is not a bad thing, but it has almost no scalability. Although they have a hook system, if you compare it with the functions that pi can achieve, you'll find that their integration is not deep. And it basically runs a process when a hook event is triggered. If you need to start that process repeatedly, the cost is really very high.
Later, I completely lost interest in Claude Code. It's not that it's bad, it's just that it no longer suits me. During that time, it became more suitable for the general public, which means they are on the right track, but it's not suitable for an old - fashioned guy like me.
3 The underlying design of OpenCode makes me lose confidence
So I started looking for alternatives. First, there was Codex CLI. At the beginning, I didn't like it, neither the interface nor the model. But now its model performance is really amazing. Then there was AMP. The core members of this team used to work at Sourcegraph and then started their own business. They are all extremely top - notch engineers. They actually created a very commercial coding harness, and they won the market by "cutting functions" instead of "stacking functions". Many of their design logics are exactly the same as mine. If you want a commercial programming framework, I definitely recommend AMP. Factory has a similar idea and is very solid, but it's not as radical and experimental as AMP.
Then there's OpenCode, an open - source framework that many people are using. I have a passion for open - source. I've been in the open - source circle for 17 years, managing projects of all sizes. Open - source means a lot to me. So I thought, since OpenCode is so close to me, let's give it a try. And to be honest, except for AMP, the OpenCode team is the most down - to - earth and practical in this circle. They won't try to deceive you with functions that you'll never use in your life. Instead, they strive to maintain a very stable core experience. I also highly agree with their thinking about "what programming agents mean to our profession".
But the problem with OpenCode is that it does a terrible job in context management. For example, in each round of conversation, it calls a function called SessionCompaction.prune, which deletes all the records before the last 40,000 tokens. Everyone should know about prompt caching, right? What it does means destroying all your caches.
There's an interesting story between OpenCode and Anthropic. In my opinion, Anthropic's later attitude is very logical: "You can't do this." Although this incident didn't become a big public issue, the reason is simple: if you go to the gym and don't follow the rules and abuse their infrastructure, you'll definitely be blacklisted. Although I don't have evidence, I guess this is why the relationship between Anthropic and OpenCode is tense. I completely stand on Anthropic's side. Don't abuse their infrastructure.
There are also some other pitfalls. For example, OpenCode comes with LSP (Language Server Protocol) support. Suppose you give the agent a task to modify a bunch of files. How will it actually do it? It will modify them one by one. What do you think is the probability that the code can be compiled after the first round of modification? When you modify the code line by line, how long will it take to get it back to a compilable state? The answer is that it can't get back. Maybe after the first and second modifications, the code is still broken.
At this time, if you go to the LSP service and ask, "Hey, I just modified this line. Is the code broken?" The LSP will definitely say, "Yes, it's completely broken." Then this function will directly put the error message after the tool call and feedback it to the model: "You did something wrong just now." The model will be confused: "What the hell? I haven't finished yet! You're telling me this now?" If this kind of thing happens too often, the model will eventually just stop working, resulting in very poor results. So I really dislike attaching LSP when the agent is working. The only time when linting and type checking are really needed is when the agent thinks it has completely finished its work.
Moreover, there's a recent change in OpenCode: in a session, each message is actually saved as an independent JSON file. In my opinion, this shows a lack of in - depth thinking in the overall architectural design. Once I lose confidence in this underlying design, I don't want to use this tool anymore.
In addition, OpenCode comes with a server architecture by default. The client connects to the server, and the terminal interface is just one of the clients. This was supposed to be very high - end, but it turned out that there was a default remote code execution (RCE) security vulnerability. If you're so proud of your server architecture, I assume you should be a group of mature engineers and at least consider security. But obviously, they didn't, and this vulnerability has been there for a long time. I'm not trying to blame anyone. In this unprecedentedly fast - paced industry, mistakes are inevitable, but I don't want to use a tool with such hidden dangers.
This is my observation of the existing coding harnesses. AMP is actually good, but I don't have control. It even decides which model to use for which type of task, which doesn't suit my personality.
Later, for some other reasons, I started researching Benchmark and found TerminalBench. Simply put, it is an evaluation harness specifically for agents, containing a large number of tasks related to computer operations and programming. It has about 82 very diverse tasks, from "fix my Windows settings" to "help me write a Monte Carlo simulation". It has a leaderboard that lists various combinations of agent frameworks and models.
Among them, an agent called Terminus really amazed me. It is one of the best - performing frameworks on the leaderboard. How does it work? The model only gets a tmux session. The only thing it can do is send keystrokes and then read the returned VT sequence codes. This is the most minimalist and primitive interface between the model and the computer. However, its performance is top - notch.
What does this indicate? Do we really need those fancy functions to make the model work?
For me personally, it's not just about whether the model is good or not, but also about how the "person" as a user should interact with the agent. The user experience or developer experience of Terminus is obviously not what I want, but it proves one thing: in the existing programming frameworks, many functions may not be necessary to achieve good results. You don't need file tools, sub - agents, or online searches. You don't need anything.
Based on these findings, I summarized two core arguments: First, we are currently in a stage of "messing around and seeing the results". No one knows what a perfect programming agent should look like. Everyone is trying. Some take the minimalist route, while others take the "spaceship" route, such as creating agent clusters or complete autonomy. I think this matter is still undecided, and industry standards have not emerged yet.
Second, we need a better way of "messing around". Programming agents must be self - modifiable and highly malleable so that we can quickly experiment with new ideas and see if we can come up with some new industry standards or workflows.
So my basic idea is very simple: strip away all the redundancy, build a minimalist and extensible core, and then add a few small features that make people feel comfortable using it. It's