HomeArticle

Curosr: Please love me one more time.

硅星人Pro2026-05-25 09:29
As the capabilities of models become more similar, the ability to create custom shells becomes more important.

In the first stage of AI Coding, the most believable story is the closed-loop advantage of "native model + native application."

Claude Code is backed by Anthropic and can be the first to use the most powerful Claude. The model capabilities, context window, and tool invocation can all be optimized end-to-end. Every layer, including training data, inference parameters, and tool protocols, can be specifically tuned for the coding scenario without having to accommodate any third-party APIs.

In contrast, Cursor is more like a "shell" product that accesses models. Even if it provides an excellent user experience, it is easily considered to be only temporarily ahead. Once the native teams unleash the advantages of their models or increase the prices of third-party APIs to capture the market, the products at the application layer will go out of business.

However, it seems that this judgment is no longer valid.

In the recently updated list by Artificial Analysis, Cursor CLI and Claude Code use the same Claude Opus 4.7 (medium), with comprehensive scores of 61 and 60 respectively. The actual numbers themselves are not important. What matters is that they illustrate one thing: the so-called "native" advantage is gradually being replaced by the engineering accumulation at the application layer.

With the same model and similar results, Cursor has achieved an experience that is no less than that of native products through its "shell" approach.

This is where Cursor's opportunity for a counterattack lies.

1

Make the model a replaceable component

Instead of proving that its model is stronger, Cursor addresses the "shell" criticism by making the model less important.

It builds a highly efficient system around the model. Context management, codebase understanding, and the collaboration between the IDE and CLI, etc., do not depend on any single model but determine whether an agent task can be successfully executed.

In Cursor 3, which was released in April, the Agents Window was elevated to a central position. Developers can simultaneously schedule multiple Agents in the same interface, running them locally, in the worktree, in the cloud, via remote SSH, and in different repositories.

Subsequent updates have gradually improved the system in this direction. The Cursor SDK opens up the Agent runtime to developers, allowing enterprises to integrate Agents into their internal tools. Cloud Agents have added multi-repository support and audit logs, addressing the security and compliance concerns of enterprise users.

At the same time, Cursor is also moving the task entry points out of the IDE. In the future, tasks may not necessarily start from the editor; they could come from an idea or a message.

What Cursor aims to do is to automatically funnel these entry points into the Agent system and ultimately present the results to developers in the form of diffs, test results, and pull requests.

From an AI programming tool to an engineering system centered around Agents, this is the real ambition behind Cursor's latest update.

Once this system is established, the model itself becomes replaceable.

If Claude is powerful, integrate Claude; if GPT is powerful, integrate GPT; if an open-source model is useful, incorporate it into the same workflow.

Moreover, as the capabilities of top models are converging, the perceived difference between integrating Claude Opus 4.7 and GPT 5.5 in many real development tasks is becoming increasingly small.

When "whose model is stronger" is no longer the decisive factor, the user's selection logic changes. They are no longer forced to be tied to a particular model but instead value who can better schedule the capabilities of different models.

The layer that was previously underestimated as a "shell" is now becoming the core factor in users' choices.

2

Not elegant, but effective

After solving the problem of "being replaced," Cursor faces another more fundamental dilemma: it is not profitable.

Its business model inherently has an awkward cycle: the better the tool, the more users will call it, and the higher the cost of the underlying model API will be.

Moreover, the coding agent scenario itself involves high token usage, high tool invocation, and a high retry rate.

Like many AI coding startups that rely on third-party models, Cursor was still in a negative gross margin state until recently. According to a subsequent disclosure by The Information, as of the quarter ending in January 2026, Cursor's gross margin was approximately -23%, and it only barely turned positive thereafter.

The turning point came from Cursor's self-trained Composer series of models.

Cursor's approach is not to build an excellent foundation model from scratch but to more pragmatically use its own models to take over a large number of routine coding agent tasks, reducing its reliance on upstream APIs.

Tasks that do not require the most advanced inference capabilities, such as routine code completion, formatting, and simple refactoring, are taken over by Composer, leaving the expensive API calls for scenarios that truly need them.

The results were quickly evident. Cursor's large enterprise accounts have achieved positive gross margins. Although individual developer accounts are still in the red, the overall situation has improved.

The latest Composer 2.5 is a continuation of this logic. Cursor admits that it is based on the Kimi K2.5 base and is specifically trained for long-cycle programming tasks, with a synthetic data volume 25 times that of the previous generation.

Choosing an open-source base instead of self-development and specialized fine-tuning instead of all-round training helps to keep costs down at every step.

This mechanism ultimately results in an acceptable cost structure.

The most complex requirements are handed over to cutting-edge models like Claude and GPT, while the most frequent and standardized tasks in the middle are handled by Cursor's own Composer.

Combined with Cursor's own system, the more specific the requirements, the more room there is for training specialized models, and the lower the reliance on upstream models.

3

The qualification for reevaluation

In a sense, what Cursor is doing is to accomplish a noble task in an inelegant way.

It does not insist on proving "my model is stronger than yours," nor does it try to compete head-on with Anthropic and OpenAI in basic research. It accepts its position and, from that position, maximizes what can be done at the application layer.

The AI foundation models are moving from a "winner-takes-all" situation to a "multi-polar coexistence" state. When no single model can dominate in all scenarios, the engineering capabilities at the application layer become the real determinant of user retention.

Whoever can make more efficient, stable, and cost-effective use of limited model capabilities will win the real competition.

This competition is not over yet. Claude Code will not sit idle. The ceiling of model capabilities is still rising, and the native teams are also accelerating their investment in tool invocation and context optimization.

How long Cursor's opportunity window will remain open depends on two things: whether its engineering accumulation at the application layer can continue to lead, and whether it can wait for the market landscape to truly stabilize before its cost structure becomes completely healthy.

But at least for now, it has regained the market's trust.

In the AI industry, being able to survive long enough to be reevaluated is already a victory in itself.

This article is from the WeChat official account "Silicon Star Pro." Author: Dong Daoli. Republished by 36Kr with permission.