Cursor's Self-Developed Model Outperforms Opus 4.6 with Drastic Price Cut, Igniting Programming Enthusiasm

A new reinforcement learning method has been introduced behind the scenes.

What on earth is going on, friend!

The new Cursor model not only outperforms Claude in terms of performance, but its price has also been significantly slashed (not even just cut in half).

As we all know, as a model provider, Cursor initially gained a large number of fans by supplying the Claude model.

Now, however, it has developed its own programming model and dethroned Claude -

Its latest programming model, Composer 2, not only surpasses Claude Opus 4.6 in capabilities, but more importantly, its price has dropped significantly.

Let me put it this way: while others cut prices by half, Cursor has made an even more drastic cut, which can be described as a "ankle cut".

So, the question is: how can Cursor lower the price when everyone else is "raising prices"?

(Note: With the popularity of the "Lobster" model, the global consumption of tokens by large models has increased exponentially. Therefore, since the beginning of the year, cloud providers and large model companies at home and abroad have been collectively raising prices.)

Cursor has revealed the answer -

A new reinforcement learning method.

Stronger than Opus 4.6 and the price keeps dropping!

Let's start with Composer 2, which is currently available on Cursor.

From its English translation "Composer", you can probably guess that this model focuses on "programming" (just kidding).

Given the sharp increase in token consumption for programming after the popularity of the "Lobster" model, Cursor currently has only one goal -

Cost - effectiveness, cost - effectiveness, and more cost - effectiveness.

What is cost - effectiveness? Naturally, it is "the optimal combination of intelligence and cost".

In terms of capabilities, Cursor says:

Composer 2 has made significant improvements in all the benchmark tests we measured, including Terminal - Bench 2.0 and SWE - bench Multilingual.

For example, in the Terminal - Bench 2.0, which measures the terminal operation ability of agents, its performance currently ranks between GPT - 5.4 and Claude Opus 4.6.

Moreover, from the iteration of the Composer model, its evolution speed is accelerating.

In terms of pricing, the input price of the standard version of Composer 2 is $0.5 per million tokens (approximately 3.5 yuan), and the output price is $2.5 per million tokens (approximately 17.2 yuan).

You can see that compared with Claude Opus 4.6, the price is almost at rock - bottom.

Meanwhile, Cursor has also launched a "variant with the same intelligence level but faster speed" - Composer 2 Fast.

The pricing of this default model is $1.5 per million input tokens (approximately 10.3 yuan) and $7.5 per million output tokens (approximately 51.7 yuan).

Compared with Claude Opus 4.6, it not only maintains the price advantage but also has a much faster speed.

According to Cursor, the key to achieving a balance between performance and price lies in the introduction of a new reinforcement learning method.

It's important to note that this method is not a reasoning skill but an ability developed through actual training.

Introduction of the "note - taking" reinforcement learning method

If we were to summarize this new method in one sentence, it would be:

Teach the model to "take meeting minutes for itself", so that it can continue to work on long - term tasks that it couldn't remember before, step by step.

Here is what Cursor said:

Although this so - called "reinforcement learning method of self - summary" sounds a bit cumbersome, the idea is actually very clear.

The core problem it solves is -

Most AI programming assistants are quite capable nowadays, but once the tasks become longer and more complex, they start to make mistakes constantly.

The reason behind this is well - known: the context window is not large enough.

A complex engineering task often involves tens of thousands of lines of code and hundreds of steps of operation. Since the context window of the model is always limited, many tasks cannot be completed.

In order to break through the context bottleneck, there are currently two mainstream solutions in the industry centered around "compression":

Either summarize the content and then continue;

Or directly slide the context window and discard the earlier context.

There are also some relatively new exploration attempts - compressing in the latent space, compressing the context into vectors instead of text (although this method is slower than text compression, it has higher accuracy).

However, none of these methods seem reliable at first glance. They may cause the model to forget key information in the context, thus reducing its effectiveness when dealing with long - running tasks.

In other words, the longer the task, the more likely the model is to deviate from the right track.

Cursor's solution is - first, summarization is important; second, internalizing this summarization ability into the model itself is also crucial.

So they added a "self - summary" mechanism to their model:

When the model is working on a task, instead of passively compressing, it actively stops and writes a "phase summary" for itself, commonly known as "taking notes".

The specific process is roughly as follows:

1. Composer continuously generates content based on the prompt until it reaches a fixed token length trigger point. 2. Insert a synthetic query asking the model to summarize the current context. 3. Provide the model with some draft thinking space to let it come up with the best summary and then generate the compressed context. 4. Composer uses the compressed context and goes back to step 1; this context includes the summary and the conversation state (planning state, remaining tasks, number of previous summaries, etc.).

The key point here is that the model's self - summary ability is not a reasoning skill but is developed through training.

During the reinforcement learning process, this summarization ability is included in the reward system:

Good summary → easier success in subsequent tasks → higher reward

Summary with missing information → task failure → punishment

As a result, the model gradually learns what information is worth keeping and what can be discarded.

You can see the specific effects by comparing it with traditional methods.

In a set of high - difficulty software engineering tasks, the "traditional summarization method" requires writing thousands of tokens just for the summary prompt, and the compressed result is also quite long, averaging more than 5000 tokens.

However, the prompt for Composer is very simple, basically just one sentence "Please summarize the conversation", and the average compressed output is only 1000 tokens.

In the same tasks, the latter uses only 1/5 of the tokens of the traditional method, and the errors caused by compression are directly reduced by about 50%.

In other words, it compresses more effectively while retaining more key information.

What's more interesting is that it can really solve long - chain tasks.

Cursor presented a classic problem that stumped many models - running the Doom game on the MIPS architecture.

I have provided /app/doomgeneric/, which is the source code of Doom. I have also written a special doomgeneric_img.c and hope you can use it; it will write each frame drawn to /tmp/frame.bmp. Finally, I have also provided vm.js, which will read a file named doomgeneric_mips and run it. Please figure out the rest...

Since the model needs to modify the code, compile and debug, and try and error repeatedly, many models get stuck in the end.

However, after 170 rounds of interaction, Composer found an accurate solution and compressed more than 100,000 tokens into 1000 during the process.

In short, a series of internal tests show that:

By integrating compression into the training cycle, Composer has learned an explicit mechanism that can efficiently pass key information backward and become more capable of handling high - difficulty tasks.

As mentioned before, Cursor is developing at a rapid pace. Now, Cursor researchers have also started to release news about Composer 3.

It can be said that Cursor now has a dual identity. Its CEO said:

Cursor is a typical new company. It is neither a pure application developer nor a model provider.

I wonder if it will be open - sourced? Anyway, the co - founder and CEO of Hugging Face has already asked on behalf of everyone (with folded hands emoji).

Reference links:

[1]https://x.com/mntruell/status/2034729462211002505

[2]https://x.com/RoboIntellect/status/2034693646822580431?s=20

[3]https://x.com/cursor_ai/status/2033967614309835069?s=20

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Cursor's self-developed model surpasses Opus 4.6, with a significant price cut, and the atmosphere of programming is boiling.

Stronger than Opus 4.6 and the price keeps dropping!

Introduction of the "note - taking" reinforcement learning method