Cursor Kneels and Surrenders: Open-Source Technology Report: Fine-Tuning Kimi in This Way Can Defeat Claude
The matter of Cursor repackaging Kimi isn't over yet...
According to the latest news, Cursor released a technical report on Composer 2, strongly proving that they are still "self - developing". (doge)
It's not a pure repackaging, but a repackaging with technology and in a step - by - step manner.
The method they used is still the pretraining + reinforcement learning they emphasized from the beginning.
However, this time Cursor has learned its lesson and honestly signed Kimi K2.5.
They kneeled down so quickly and with such sincere attitude... They even reached a reconciliation with the Kimi official.
But netizens don't seem to buy it.
Cusor: We are based on Kimi 2.5 like this
At the beginning of the report, Cusor finally got on the right track and first praised its peer Kimi:
Before training, we evaluated several potential open - source foundation models, including GLM5, Kimi K2.5, and DeepSeek V3.2, but Kimi K2.5 is the best!
The reason for choosing Kimi K2.5 is not only because of its outstanding comprehensive ability but also because of additional factors such as its execution efficiency in the self - developed infrastructure.
(Ahem) It can be said that after this incident, Cursor has finally learned the "dignity" of Chinese open - source models thoroughly.
Secondly, based on Kimi 2.5, Composer 2 has also gone through two independent training processes: continuous pretraining and asynchronous reinforcement learning.
1. Continuous Pretraining
Its purpose is to improve the model's basic knowledge and potential coding ability in the coding field and lay a foundation for the subsequent agent RL training. It is mainly divided into three sub - stages:
First, most of the computing resources are invested in the training of a 32k token sequence length. Then, a short - term long - context expansion training is carried out to increase the sequence length to 256k. Finally, the adaptation of specific code tasks is completed through small - sample instruction tuning (SFT).
In addition, to improve the online inference speed of the model, a multi - token prediction (MTP) layer is added. Combining speculative decoding technology and self - distillation strategy, the convergence speed of the model is guaranteed.
During the training process, the loss value of the model on the self - developed code library shows a logarithmic linear decline, and the code library perplexity is positively correlated with the downstream RL performance, which proves the effectiveness of pretraining.
2. Asynchronous Reinforcement Learning
The training environment highly simulates the real Cursor dialogue scenario and constructs various core tasks in software engineering.
The overall reinforcement learning training framework is based on large - scale policy gradients. At the same time, to ensure the stability of training, a policy gradient algorithm with single - instruction and multiple - sample is adopted, and a fixed sample group size is set.
A single instruction only participates in one training. The Adam optimizer is used to update all the parameters of the model during the training process. Then, the GRPO algorithm is optimized by removing the length normalization term to avoid length bias, and the KL divergence (k1 = - log r) is introduced for regularization.
At the same time, research shows that the average performance and best - of - K performance of the final model are improved synchronously, which proves that RL not only re - weights the inference path but also expands the coverage of correct solutions.
In addition, Composer 2 also adds a series of auxiliary reward mechanisms, including positive rewards for code style and interactive expression, as well as product - level penalties for improper tool calls. The reward rules are dynamically adjusted according to the behaviors emerging during training.
In the benchmark test, Cursor also presents a self - developed internal evaluation set - CursorBench.
The tasks in CursorBench all come from real Agent usage scenarios. It no longer only takes functional correctness as the only standard but also considers multiple dimensions such as code quality, execution efficiency, and agent interaction of the model in turn.
Data shows that the code modification amount in CursorBench is larger (median 181 lines), while that in the SWE - bench validation set and the multilingual version is only 7 - 10 lines. At the same time, the instruction prompts in CursorBench are more concise, with a median of only 390 characters, far lower than the 1185 - 3055 characters in public benchmark tests.
Specifically, in the test results, the accuracy of Composer 2 in CursorBench - 3 can reach 61.3%, a relative increase of 37% compared with version 1.5 and 61% compared with version 1.
Compared with Kimi K2.5, the accuracy has also been greatly improved.
Overall, Composer 2 achieves Pareto optimality in terms of cost. Its inference cost is comparable to that of smaller models, and its accuracy is comparable to that of large - scale cutting - edge models. The Token usage efficiency is also on par with other SOTA models, with no additional resource consumption.
To be honest, isn't this just the pro version of Kimi K2.5? It's just that the producer has become Cursor across the ocean.
With the open - source of the foundation model and the technical report, Cursor can also be said to be "open - source" in another sense. (doge)
We also welcome the world to repackage and support our Chinese open - source models, but just remember to sign the name~
Yang Zhilin's Rethinking of Large Models
While Cursor is issuing a report to justify itself, on the other hand, Kimi has already "looked forward" -
Yang Zhilin shared in detail his and the Kimi team's latest thinking in the field of open - source models and model training in a speech at the Zhongguancun Forum.
First of all, he believes that the essence of large models is to convert energy into intelligence, and the most important thing is to scale up.
In other words, it is necessary to convert as much energy as possible into more and higher - level intelligence through computing power and model carriers.
The large - model scaling, which is often referred to as the Scaling Law, is not equivalent to blindly increasing computing power but requires methods and efficiency.
Kimi's Scaling strategy lies in three points:
1. Improve Token Efficiency.
A truly powerful model doesn't compete in terms of who has more computing power or piles up more data, but who can learn more intelligence from the same limited data.
2. Expand Context Length.
The ability of a model to handle longer contexts means it can handle more complex and long - range logic, thus completing more complex tasks.
For this purpose, Kimi specially designed a new network architecture, Kimi Linear, and training data to fundamentally improve the long - context ability, rather than simply and rudely extending the window.
3. Introduce Agent Clusters.
This is also a new idea proposed in Kimi K2.5. Instead of focusing on making a single model perfect, a group of Agents are introduced to collaborate to solve more complex problems.
Then, through the ability of the Agent cluster, large - scale input, output, execution, or orchestration can be achieved.
At the same time, Kimi also believes that a good underlying network architecture is also very important.
For example, their newly open - sourced model architecture, Attention Residuals, can be considered a LSTM variant that applies attention to the network depth, allowing the model to use the information of all layers more efficiently.
But in fact, the Attention architecture and residual learning are classic technologies from many years ago. Now that the computing power is stronger and the research is more engineering - oriented and large - scale verified, we can't rely solely on theoretical ideas, and the past standard answers can also be re - challenged and improved.
As for open - source, we should do it and continue to do it vigorously.
Open - source models are gradually becoming the new standard. Open - source models represented by Kimi K2.5 have become the benchmarks for all chip manufacturers in the world to test hardware performance.
Now, many research institutions around the world are also using Kimi K2.5 for research. We hope that through open - source, everyone can access intelligence with a very low threshold.
Ultimately, we can form an open - source ecosystem to jointly promote the development of the AI field.
It has to be said that true open - source can be seen in domestic models~
Finally, Yang Zhilin also asserted:
Large - model training has entered the third stage!
If in 2023 and 2024, large - model training mainly relied on natural data with a small amount of manual annotation as a supplement. Then in 2025, the industry will pay more attention to manually screening high - quality tasks and building large - scale reinforcement learning systems.
Starting from 2026, the entire AI R & D process will undergo major changes: The R & D subject will shift from humans to AI. AI will automatically synthesize tasks, build training environments, and even explore new model architectures, while researchers will mainly provide computing power and Token resources.
To put it simply, from data collection by humans → task selection by humans → AI taking over the entire training process, AI will gradually change from a trainee to a participant and even a leader in R & D.
In the future, the R & D speed in the AI field will continue to accelerate at a pace far beyond our imagination.
Reference Links:
[1]https://x.com/cursor_ai/status/2036566134468542651
[2]https://cursor.com/resources/Composer2.pdf
[3]https://mp.weixin.qq.com/s/GjN_dx380VnUmRWHGRajiA