HomeArticle

The Do-or-Die Situation of AI Coding: Spec is Eroding Human Coding, Agents' Reinventing the Wheel is Dragging Down Efficiency, and Context Engineering Becomes the Decisive Factor After Token Costs Get Out of Control

极客邦科技InfoQ2025-12-30 17:20
In 2025, the AI Coding ecosystem is defining a new role for programmers in 2026. The answer might be hidden among a pile of smoldering Markdown files.

The AI Coding ecosystem in 2025 is defining a new role for programmers in 2026. The answer might be hidden in a pile of smoking Markdown files.

In the past six months, Spec-driven development has become extremely popular. In the repositories, layer upon layer of "Markdown scaffolds" for agents have quickly piled up. It is hailed as the cutting-edge solution for AI Coding: using a contract to force the agent to actually do the work.

But the question arises: Can this set of contracts really handle the complexity accumulated in decades of software engineering? Or will the ultimate value of programmers shift from "writing code" to "defining rules" — taming this technological revolution with natural language that AI can understand?

1

The Ceiling of Completion and the Inevitable Rise of Agents

The evolution of AI Coding has clearly divided into two eras.

The first wave was initiated by Copilot and Cursor: This is a human-led programming approach. The role of AI is to predict the "next character" or the "next editing position", improving speed and fluency within a local scope.

The boundaries of this paradigm are actually very clear. Completion must be smooth enough not to interrupt the programmer's flow, which means the end-to-end delay is strictly limited to the order of a few hundred milliseconds. The available model scale and context length are naturally constrained: the model parameters cannot be too large, and the context length can never be fully utilized.

Meanwhile, the completion ability has been continuously expanding — from in-line prediction to cross-line, cross-function, and cross-file continuation and rewriting, and even local refactoring. Although there is still much room for improvement in the experience, understanding the global intention, project constraints, and dependency relationships in such a short time is close to the engineering limit: it places near-limit requirements on the post-training of the general completion system, context selection, inference strategies, and engineering links.

In the second wave, especially in the past 6 - 12 months, we have witnessed a real paradigm shift: the rise of agents. It is no longer limited to predicting the "next character" but directly takes over tasks — from requirement analysis to code generation, from tool invocation to result verification.

In comparison, the completion paradigm has limitations such as a small range of modification and high attention consumption from developers. Compared with the agent dialogue mode that can generate code efficiently, the marginal utility of its continuous optimization is decreasing. As the model capabilities and toolchains improve, agents will cover more stages from requirements to delivery and gradually become the main process; in scenarios dominated by agents, completion may step back and become one of the underlying capabilities supporting the fine execution of agents.

In this regard, Tian Zhu, the core developer of TRAE, pointed out that this does not mean that the completion paradigm has reached the technological ceiling. On the one hand, many developers still enjoy the process of "writing code themselves", and there is still much room for improvement in the completion experience in these scenarios. More importantly, from a capability perspective, completion always solves the same problem — predicting the most reasonable next editing action given the context. In the past, this ability was mainly used to assist human coding; in the agent system, it can also be reused to assist the execution of AI itself. For example, the dialogue panel of agents and the generation and completion of tool invocation parameters can essentially be regarded as different forms of "completion scenarios".

Additionally, this year you can observe a very interesting phenomenon: almost all leading programming tools have begun to evolve into a combination of three parallel forms: IDE, CLI, and Cloud. Many products start with one of these forms but quickly extend their reach to the other two because what users really need is not a specific interaction method but a complete chain that can deliver tasks in different scenarios. Therefore, we can more clearly understand the "origin" and characteristics of some representative tools: Claude Code originated from the CLI, so it may be stronger in the CLI; OpenAI Codex originated from the Cloud; Cursor originated from the IDE and is one of the largest players in the IDE field.

Among them, CLI and Cloud Agents are agent-dominated forms from the start. They have less demand for UIs, either working in the terminal or using a simplified web interface, along with GitHub PRs for collaboration and delivery.

However, Tian Zhu believes that the IDE will still be the most widely used entry point, and the reason is simple: it best suits the long-established work habits of programmers. He realized in the earlier practices of his team that disruptive innovation in professional productivity tools often comes with a comprehensive reshaping of developers' cognition and work methods. In his view, the form of the IDE is likely to change fundamentally within three years and will no longer revolve around the editor — the SOLO mode of TRAE and the Agent mode of Cursor are precisely the exploration and practice in this direction by the industry.

To put it more simply: the IDE is evolving from a "toolbox for humans" to a "toolbox shared by AI and humans". Many human-centered capabilities in the traditional IDE are now being disassembled into smaller, clearer, and more AI-friendly tools that can be called by AI agents as needed. Thus, as technology evolves, the IDE will become more like a capability container and execution environment for agents and humans to collaborate.

These three paths ultimately converge to Agentic Behavior to varying degrees.

The IDE continues to evolve in the collaborative experience of "human + agent", the CLI strengthens agent capabilities in engineering automation and pipelines, and the Cloud Agent expands the boundaries of R & D collaboration in terms of time and space.

Although the forms are different, the goals are highly consistent: the agent-dominated paradigm. In the agent form, everyone's core requirements converge: the ability to use tools correctly, maintain the stability of long-term tasks, and make continuous corrections based on feedback signals. Therefore, the capabilities of Coding Agents essentially boil down to the stability of long-term tasks and the ability to call tools.

When the execution power shifts from humans to agents, the complexity in software engineering that has been covered by experience and tacit understanding for decades is forced to be pre - established as explicit rules — that's when the Spec is recalled.

Screenshot from the comment section of "The Dream of Everyone Being a Programmer Should Wake Up!"

2

Can Spec Really Solve the Problems of AI Coding?

Since the term became popular, only a few months have passed, and an embarrassing but unavoidable reality has gradually emerged: the "Spec" in everyone's mouth is no longer the same thing.

Some people say that Spec is about writing better prompts, some understand it as more detailed product requirement documents, and others say it's architecture design documents. But for more engineering teams, Spec just means "using a few more Markdown files when writing code".

So you'll see gemini.md, claude.md, agent.md, cursor-rules, and various Skills quickly piling up in the repository, along with GitHub configuration files. In the past few months, various proxy tools from large companies have been launching "reusable context frameworks": Claude Skills, Cursor Team Rules, GitHub Copilot Spaces, along with third - party tools like Tessl and BMad Method (BMM). The entire toolchain has evolved explosively in just one year, giving rise to a large number of new infrastructure species.

Many teams have an intuitive feeling in practice that when AI writes code, it doesn't lack Spec but Context. So some people simply equate the two: "Spec is context engineering" or "Spec-driven development is equivalent to context engineering".

However, front-line tool teams in China tend to believe that "they are not the same thing".

In their view, Spec is more like the most critical and stable type of content in the context, serving as the "guiding context": it clarifies the goals, constraints, and acceptance criteria, equivalent to giving the agent an executable contract that clearly states what it should do and to what extent it is considered correct.

Under this division of labor, Spec solves the problem of "what we are going to build", while Context Engineering focuses on "whether the model has received enough information at this moment". Spec itself does not automatically convert into effective context, but it is often a long - term source of high - quality context — the two are highly coupled but cannot replace each other.

Therefore, Spec should not be limited to a few fixed document forms. More accurately, Spec is the sum of all contracts used to guide code generation: product documents, design drafts, interface definitions, boundary conditions, acceptance criteria, and execution plans can all be included in the Spec system, just as subsets at different stages and granularities.

However, due to its "wide coverage, multiple forms, and long lifecycle", Spec is particularly difficult to standardize.

In this round of discussions on Spec-driven development, Kiro is often regarded as one of the important promoters. Its technical director, Al Harris, mentioned in a public sharing that in order to find a suitable Spec form, the team internally tried about seven different implementations — from ephemeral spec, hierarchical spec to TDD - based spec, almost "adding the spec suffix to everything". In the end, they were repeatedly answering three questions: when the Spec should be "finalized", how detailed it should be, and how to ensure that the finalized content remains unchanged during iteration.

However, he also emphasized that the implementation of this Spec-driven development is still evolving, and the direction they ultimately want to bet on is to make Spec cover the entire SDLC chain, including requirements, design, task decomposition, and verification mechanisms, so as to bring the rigor of traditional software engineering back to AI development.

Behind this, we cannot avoid a key question: Spec needs to handle the complexity accumulated in decades of software engineering.

In the view of Huang Guangmin, the product director of CodeBuddy, the standard of Spec is essentially the concretization of software engineering theory in AI programming tools.

However, the problem is that after years of development, software engineering theory has not formed a unified standard that can be applied universally in production practice. Therefore, different Spec variants are actually weighing different contradictions (such as flexibility and rigor), and the optimal granularity also varies depending on the task.

He believes that whether the Spec standard is effective indeed depends on the application scenario because Spec essentially uses a document/structure to exchange three things: correctness, efficiency, and maintenance cost. Different scenarios assign different weights to these three factors, so there will not be a single standard but multiple forms with relatively high acceptance.

Is Spec the Return of the "Eliminated" Waterfall Process?

Software engineering is a highly uncertain and complex system. In long - term tasks, large models may hallucinate, forget, or experience goal drift; if there is no mechanism to compensate, agents are likely to deviate more and more, and the rework cost will increase rapidly. That's why Spec has become attractive again — it attempts to more clearly fix the key goals, constraints, and acceptance criteria.

However, controversy has also arisen. An agile practitioner once bluntly said that the direction of SDD is wrong. In his view, this method tries to solve a problem that has already been proven to be unfeasible: how to remove developers from the software development process? In this scenario, programming agents are used to replace developers, and through careful planning, it is ensured that agents can achieve this goal. This is almost the same as the requirements of the waterfall model: a large amount of documentation work is completed before coding, and developers only need to convert the specifications into code.

The problem is that software development is a non - deterministic process, and planning cannot eliminate uncertainty, as pointed out in the classic paper "No Silver Bullet". The agile method has long abandoned the development approach centered around a large number of pre - placed documents. So, will AI Coding lead to a "return of the waterfall process"?

When discussing Spec from an engineering perspective, the common focus is not "whether to write down all the thoughts" but "which part should be structured". In Huang Guangmin's view, what Spec Coding really wants to structure is not the entire thinking process of developers but those parts that are most likely to go wrong in long - term tasks and are most worthy of verification and precipitation.

The industry is still in the exploratory stage regarding Spec. A more reasonable form of Spec is a "living contract". It is not a static, one - time document but a key intermediate state in the Plan - Execute closed - loop: a good Spec - driven practice is not "writing a perfect Spec first and then starting to write code" but clarifying the correctness criteria with Spec and then continuously calibrating the consistency between Spec and code artifacts in the process of reasoning - execution - feedback. "This is actually closer to the real engineering state than traditional development. Requirements, constraints, and implementations will change. The key is to make these changes traceable, verifiable, and rollbackable."

Taking a longer - term view, this discussion will naturally lead to an age - old question in the field of software engineering. Tian Zhu mentioned in an exchange that some explorations in AI Coding made him recall the goal that software engineering has repeatedly pursued but never fully achieved in the early days: Is there a sufficiently complete and verifiable description that can clearly define how the system operates and can be reliably reproduced?

Under such a vision, Spec is no longer just a pre - explanation or process record of code but is placed in a higher position — it may gradually evolve into a more high - level and stable engineering product than code.

If we look at the entire development history of software engineering, from the initial 0s and 1s to assembly languages, high - level programming languages, DSLs, and various configuration and declarative specifications, it is essentially a continuous improvement in the system's ability to express "human intentions" and the level of abstraction. Along this path, Spec is more like an attempt at the next abstraction upgrade at the natural language level.

Of course, this path is not easy. The ambiguity of natural language makes it difficult to be directly engineered, and it is destined to be a challenging exploration path without a mature paradigm. But precisely because of this, it has become a direction that the industry is constantly exploring and gradually advancing, rather than a conclusion that has been proven or denied.

Software Abstraction: Why Do Agents Always Like to "Reinvent the Wheel"

In the practice of AI Coding, there is a long - standing problem that many developers have repeatedly complained about: Coding Agents extremely prefer to "implement functions from scratch" rather than reuse mature libraries.

On the one hand, the industry is exploring higher - level abstraction forms; on the other hand, it seems to be bypassing the abstraction and reuse systems accumulated in decades of software engineering, especially those libraries and interfaces that have been verified and optimized. In real - world development, it is rare to start writing a brand - new "green - field project" from scratch. More often, developers are modifying and extending existing applications based on existing open - source libraries or doing patch work.

However, for the model, "writing a working version by itself" is often the path with the lowest risk. When it is uncertain about the version, usage, or boundaries of a library, falling back to "implementing it by itself" is almost an inevitable choice. On the one hand, the distribution of pre - trained corpora at the library level is not balanced; on the other hand, the rewards during the alignment phase are more inclined to "running correctly" rather than "prioritizing the reuse of existing abstractions".

Coupled with the timeliness issue of library knowledge: frequent version updates, frequent API changes, and missing or even conflicting documentation are common. Even if the user clearly specifies a library, the model may still use it incorrectly if these key information are not accurately put into the context.

However, this is not an unsolvable problem. The key is not to repeatedly manually correct the agent or conduct micro - interventions but to supplement the information sources it can rely on. As Huang Guangmin emphasized, instead of constantly "reminding" at the result stage, it is better to prepare the information at the execution stage: use MCP tools like Context7 to supplement versions, usage, and examples, and then inject the correct usage into the task context through "progressive disclosure"; for enterprise - internal component libraries, it is necessary to systematically precipitate documentation, examples, best practices, and usage boundaries so that agents can stably reuse these abstractions instead of constantly starting from scratch.

When new abstractions are not mature, old abstractions are bypassed, and the context continues to expand, all these problems will ultimately converge in one place — runtime cost.

3

The Rise of Token Engineering: Why Did the Cost Suddenly Get Out of Control?

When did tokens start to become a headache? Not when you first used up your free quota, but when you realized that it is no longer just "the consumption of a single conversation" but a core variable that can directly influence tool pricing, product strategies, and even force platforms to change their rules.

This year, two events almost simultaneously brought this issue to the forefront. One was Cursor: Some people in a relatively inexpensive subscription plan managed to run up a much higher call volume than the platform expected through a specific method, completely exhausting the cost boundary. Subsequently, within less than a year, Cursor raised prices and cut features five times in a row to reduce losses.

The other event occurred on the Token leaderboard of Claude Code: The top global spender on the leaderboard burned 7.7