HomeArticle

Kimi doesn't have the fate of DeepSeek.

阑夕2026-03-30 18:27
Everyone has their own fate.

2024

Perhaps you still remember that in the not - so - distant year of 2024, Kimi was the rising star in China's AI field: it secured a financing of 1 billion US dollars, its concept stocks hit the daily limit multiple times, its model with 2 million words input outperformed GPT, and it adopted an aggressive promotion strategy through advertising investment.

However, the exposure it received at that time far exceeded its technological influence.

In fact, very few people could use that 2 - million - word model later. It was later learned that it was an experimental model, and the cost of each run was close to three digits. It was completely impossible to serve a large number of users.

At that time, the technology circle looked down on Kimi.

However, relying on the gimmick of the 2 - million - word model, Kimi still established the label of "long - text processing" in users' minds.

2025

At the beginning of 2025, DeepSeek emerged suddenly and truly became the representative of China's AI technology with its technological strength.

By mid - 2025, there had been no news of Kimi's financing for almost a year. The dominant narrative was that it was "doomed". Employees began to leave, and the industry basically pronounced the death sentence on this startup project.

If you had been in seclusion for 9 months starting from mid - 2025 and saw the news in March 2026:

Kimi's latest valuation is 120 billion yuan;

Kimi's revenue in 20 days exceeded that of the previous year;

Kimi's model was repackaged by Cursor, the hottest AI programming tool valued at 350 billion yuan (you were in seclusion for 9 months, so you didn't know that the hottest AI programming tool was already Claude Code, and Cursor had become the second - best);

Kimi's new model was introduced as the main model by Cloudflare, a company with 20% of Internet traffic and a market value of over 500 billion yuan;

Kimi's new model became the only open - source model and Chinese model introduced by Perplexity, the world's largest independent AI search application;

Kimi's new technology "Attention Residual" started to transform the foundation of the deep - learning architecture that had remained unchanged for more than a decade, and was praised by Andrej Karpathy, the former co - founder of OpenAI, Jerry Tworek, the father of OpenAI's inference model, and Elon Musk;

Yang Zhilin became the only representative of a global independent large - model company invited to give a speech at NVIDIA's 2026 GTC Annual Conference...

You would probably be extremely shocked.

People say that "one day in the AI world is like one year in the real world". A lot of things did happen in the AI field in 9 months.

But ultimately, it all boils down to one thing: the paradigm of AI technology has changed. The most common and lazy way to summarize this change is from Chat to Agent.

For the 30 million programmers around the world, the change is that the most popular tool has shifted from Cursor to Claude Code.

For early adopters who are always the first to embrace new technologies, the change is that they open the black - and - white command - line terminal similar to the DOS system more frequently...

For AI companies, the change is that they gradually realize that models that are better at chatting are far less valuable than those that can write code and call tools.

The coolest product has changed from ChatGPT to Claude Code, and the coolest startup has changed from OpenAI to Anthropic.

Let's go back to the Chinese market at the beginning of 2025.

DeepSeek R1 became extremely popular because it replicated and open - sourced the "deep - thinking" ability of OpenAI's o1. Another "general Agent" product, Manus, also emerged suddenly...

At that time, most Chinese AI companies were busy replicating DeepSeek R1 and launching new models with "deep - thinking" capabilities. A few companies realized that the model behind Manus was what was really worth investing resources to "replicate". Or they did realize it, but they didn't allocate enough resources or find the right method.

One of the great values of Manus is that it visually presents the multi - round tool - calling ability of the Claude model. As a technical expert from a large - model company wrote in his blog, "Most Agent products are nothing without Claude."

It wasn't until July 2025 that China's first model focusing on Agent capabilities quietly emerged. On July 11, Kimi K2 was released, with the slogan of Open Agentic Intelligence. There is obviously an ambition here: to replicate the Agent capabilities of the Claude model and open - source it. Just as DeepSeek R1 replicated and open - sourced OpenAI o1.

Five days after its release, on July 16, the British scientific journal Nature recognized the value of this model and described it as "another DeepSeek moment".

Ten days after its release, on July 21, Jack Clark, the co - founder of Anthropic, introduced K2 in his blog and commented:

In my opinion, Kimi is a decent model. It lags behind the US frontier by a few months and follows the trajectory of DeepSeek. Its coding and tool - calling scores are high enough. I expect that people will really use it in reality, so observing its adoption rate can reflect its competitiveness.

At the end of July, in a podcast interview, Yang Zhilin explained why K2 didn't focus on "deep - thinking" first but instead focused on the programming and tool - calling capabilities required by Agent. He used the term "brain in a vat" to describe models that focus on deep - thinking. By the way, this interview with Yang Zhilin is worth reading several times. He talked about many more fundamental things at the technical level, such as the relationship between programming and Agent, and the relationship between thinking and tool - calling.

Thanks to the performance of K2 and the subsequent K2 Thinking model, Kimi finally secured financing at the end of the year, 500 million US dollars, with IDG and several old shareholders continuing to support it.

2026

Around the Spring Festival in 2026, during this crazy large - model release season, Kimi was the first to submit its "answer sheet". It might also be the most uncomfortable one for its peers because K2.5 has a 10 - trillion - parameter model, multi - modal understanding capabilities for images and videos, and supports both thinking and non - thinking modes. The models released by other startup peers are all pure - text models. Only the closed - source models of large companies have the strength to integrate multi - modal capabilities into their flagship models.

On March 16, the Kimi team published a technical paper on Attention Residuals, challenging the 10 - year - old underlying residual connection mechanism of neural networks. Andrej Karpathy, the co - founder of OpenAI, sharply commented on Kimi, saying that "it made us realize that we didn't fully understand 'Attention is All You Need'". You know, "Attention is All You Need" is the holy grail that kicked off the era of large models. Even considering the inflation of praise in the AI circle, this is an unprecedentedly high evaluation. It is said that the first author of the paper is a 17 - year - old high - school student. It's really amazing to see such a young talent.

On March 17, after the CES 2026 at the beginning of the year, Kimi's model once again became NVIDIA's go - to model for demonstrating the performance of next - generation chips and inference at Huang Renxun's keynote speech at GTC 2026.

On March 18, as the only representative of a Chinese independent large - model company invited to NVIDIA's GTC Annual Conference, Yang Zhilin's on - site speech was full of valuable insights. He immediately compared the three core modules of optimizers, attention mechanisms, and residual connections to outdated technical standards that are 8 - 11 years old and obstacles to further scaling. He demonstrated with new technological breakthroughs that "every basic technology is worth rethinking".

Then there was the well - known "Cursor scandal". Who could have imagined that Composer 2, the new - generation programming model launched by Cursor, the world's largest programming assistant valued at 5 billion US dollars, which outperformed Claude Opus 4.6 in benchmarks, was actually a repackaged version of Kimi K2.5...

The reason why Cursor, positioned as a Token intermediary, wanted to focus on "self - research" was mainly to get rid of its high - level dependence on Anthropic and OpenAI. The issue of being "held hostage" by technology knows no national boundaries. Anthropic did cut off the supply of programming tools like Windsurf. In an environment where a company is both the referee and the athlete, Cursor's desire for independence is completely understandable.

However, due to the huge gap between its capabilities and its vision, Cursor chose to remove the name of Kimi's base model and seek financing through "ghostwriting". In the end, the matter was resolved gracefully. The co - founder of Cursor publicly apologized and provided detailed reasons for choosing Kimi K2.5 as the base model in the technical report. Kimi's official also responded, saying that they were glad that Cursor used Kimi K2.5 as the base, and the two sides completed the technology authorization through the inference service provider Fireworks AI.

According to insider information, around the Spring Festival in 2026, Kimi completed a total financing of nearly 2 billion US dollars at pre - investment valuations of 4.8 billion, 6 billion, and 10 billion US dollars respectively. The shares of the 18 - billion - dollar financing round that started in March are also in high demand and require queuing.

This is, of course, also benefited from the extraordinary performance of two peer companies in the Hong Kong stock market, but more importantly, it is due to the actual performance of K2 and subsequent models. This includes the continuous positive feedback from Cursor, Cloudflare, Perplexity, Huang Renxun, Elon Musk, Marc Andreessen, Chamath, etc., as well as the financial performance that Kimi's revenue in 20 days after the release of K2.5 exceeded that of the previous year.

A friend of Kimi said in a private chat that the only thing restricting business development is computing power. Currently, there is still at least a 10 - fold unmet demand. The more computing resources, the more revenue. According to what I learned from a friend working in another large company, some large companies now need to pre - order to get enough access to the Kimi model integrated into their programming tools.

In these 9 months, Kimi achieved a complete turnaround.

Fate

DeepSeek V3 was not developed overnight. The genetic background of Magic Square Quantitative behind it determined that since 2023, it has taken a path of extreme energy - efficiency ratio that is very different from that of Silicon Valley. For most of the time from 2023 to 2024, it stayed away from the mainstream narrative and focused on self - researching the MLA (Multi - Head Latent Attention Mechanism) and the DeepSeekMoE architecture, trying to squeeze out performance beyond the physical limit with limited computing power. It finally achieved success in 2025 and brought confidence to other AI startups.

Everyone is looking forward to the next - generation model of DeepSeek to continue to amaze the world. However, the media's repeated "crying wolf" tricks will only wear down people's attention. Technological breakthroughs are not that easy. We have every reason to be more patient and wait for the next work of the DeepSeek team.

Kimi K2 was also not developed overnight. In fact, they released the unnoticed K1.5 model on the same day as DeepSeek R1. They were recognized by OpenAI as one of the two companies that first replicated o1. At the beginning of 2025, when the situation was the most pessimistic for them, they released the Moonlight series of small - scale MoE models to verify the next - generation second - order optimizer technology, which was finally applied to the 10 - trillion - parameter K2 model. Now, Muon has replaced Adam, a standard technology that has been used for 10 years, and has become the new standard adopted by new models such as Kimi, GLM - 5, and DeepSeek Engram.

As the saying goes, "You have to pay back what you've taken." Kimi enjoyed the spotlight and exposure in 2024, but didn't regain the same level of popularity in 2026.

Everyone has their own fate.

As two startups that started almost at the same time, I admire their courage. They never think that the market pattern is fixed, believe that technology is the biggest variable, and dare to pursue AGI. They are young, energetic, have a proven track record, and always believe in the power of long - term development.

Even from the perspective of the end of March 2026, the AI revolution that started at the end of 2022 has only been going on for 3.5 years. Everything is just beginning. Why can't the next OpenAI and Anthropic be a Chinese company?

This article is from the WeChat official account "Lanxi" (ID: techread), author: →. It is published by 36Kr with authorization.