HomeArticle

Sixteen months later, DeepSeek is no longer walking alone in the dark alley.

硅星人Pro2026-04-27 09:16
In China's open-source community, there is a sense of mutual appreciation. "It's quite chaotic over in the United States."

The Chinese AI drama originally scheduled for the Spring Festival in 2026 has been postponed to before the May Day holiday.

This Friday, DeepSeek V4 finally made its long - awaited debut.

Also this week, Qwen, Kimi, Xiaomi, and Tencent all unexpectedly presented their latest masterpieces.

According to the latest open - source model intelligence index released by Artificial Analysis, the top - ranked open - source models are all Chinese models.

Among them, the top 2 were both released this week. They are also the two companies that have squeezed into the global top 5 in terms of the real call volume on OpenRouter in recent days.

This is not the first time that DeepSeek and Kimi have been so in sync. Looking back at the previous times.

In January 2025, DeepSeek R1 and Kimi K1.5 were released within two hours of each other, both targeting OpenAI o1.

One month later, DeepSeek NSA and Kimi MoBA appeared almost simultaneously, both reforming the most core attention mechanism of Transformer.

In April 2025, Kimi's Kimina Prover Preview and DeepSeek - Prover - V2 were released one after another, both advancing in the direction of formal mathematical reasoning and theorem proving.

After a year, now, once again, Kimi K2.6 and DeepSeek V4 were released one after another in the same week. Two open - source models with trillions of parameters were presented on the table one after another.

They are exerting efforts in the same technical direction and almost reach the same intersection at the same time. This no longer seems like a coincidence.

1

What did they coincide on this time?

Let's first see what each of them presented in this round.

DeepSeek V4 is a MoE model with 1.6 trillion parameters and 49B active parameters, natively supporting a context of 1 million tokens. Its core narrative is an efficiency revolution. Compared with the previous generation V3.2, the computing power required for single - token inference has decreased by 73%, and the KV cache has been compressed to one - tenth of the original.

Simply put, the same hardware can handle many more requests, and the cost for the same length of text is much lower.

Meanwhile, V4 has completed in - depth adaptation to Huawei Ascend chips, migrating the underlying code from the NVIDIA CUDA ecosystem to the Huawei CANN architecture, which also adds a layer of meaning of domestic computing power migration to this round of release.

Kimi K2.6 is a multi - modal MoE model with trillions of parameters, 32B active parameters, and a 256K context. Its core narrative is not about being larger or cheaper, but about being more durable.

In the test, K2.6 can encode continuously for 13 hours, handle more than 4,000 tool calls, modify more than 4,000 lines of code, and complete a deep reconstruction of an open - source financial matching engine close to the performance limit.

This is not an ordinary "improvement in code ability", but a test of whether the model can enter a long - term, multi - tool, and multi - Agent collaborative working state from a one - time response.

K2.6 also introduces an Agent cluster architecture, supporting the parallel collaboration of 300 sub - Agents. The RL infrastructure team on the dark side of the moon has used the Agent driven by K2.6 to run autonomously for 5 consecutive days, responsible for monitoring, fault response, and system operation and maintenance.

They always meet at the same intersection, but the directions they take are different. At least in this round, one seems to be rewriting the cost structure of the model infrastructure, and the other seems to be verifying whether the model can enter a longer - cycle real - world task. The directions are different, but the fact that they are released in the same week is already enough for people to take screenshots and share in groups.

However, the two companies also have highly consistent choices: the MoE architecture with trillions of parameters, open - source, and continued belief in the Scaling Law. As of now, they are also the only two open - source models with trillions of parameters in China.

2

Something more interesting than the coincidence

The multiple coincidences make a good story, but there is a more noteworthy phenomenon behind it: the technical routes of the two companies are inspiring each other.

Last time, Kimi K2 borrowed the MLA attention mechanism popularized by DeepSeek V3. MLA is a solution for compressing attention calculation and KV cache to improve efficiency. DeepSeek V3 made it a prominent option in the technical stack of Chinese open - source models.

This time, DeepSeek V4 includes the Muon optimizer as one of the three major updates at the model architecture level. Muon is a second - order optimizer that solves the efficiency and stability problems of parameter updates during the training phase, replacing Adam, which has been used for 10 years. Kimi is one of the first teams to promote the Muon - series optimizer to trillion - parameter - level training and publicly share the experience systematically. Yang Zhilin said in his speech at GTC 2026 that it can bring a 2 - fold improvement in token efficiency. And V4 also follows suit by using the Muon optimizer to improve convergence efficiency and training stability.

In other words, MLA saves money during inference, and Muon saves time during training. And these two paths have been walked back and forth between the two companies.

This makes the "coincidence" no longer just a coincidence in the release time, but an echo at the technical stack level. It's more like the two companies are competing while using the technical ideas explored by the other as a reference coordinate for their next round of experiments.

This mutual inspiration continues to extend. In terms of the attention mechanism, DeepSeek is exploring sparse attention, and Kimi's next - generation model is exploring linear attention. The paths are different, but the questions to be answered are the same: how to prevent the long - context from being dragged down by the computational complexity of full attention.

In terms of residual connection, DeepSeek is working on mHC, and Kimi is working on attention residual. Similarly, different solutions are aimed at the same goal: to keep the training stable after the model becomes deeper.

This matter is worth mentioning because in the broader industry context, it is actually abnormal. The leading Silicon Valley companies are becoming more and more closed. OpenAI has long stopped disclosing training details, and the core methods of Anthropic and Google are also kept secret. The community can only infer their technical routes by guessing and piecing together. It's even unlikely to shake hands on the stage 😂

Between Kimi and DeepSeek, the visibility of technical reports and open - source code has significantly shortened the chain of technology diffusion. The reason why the multiple coincidences can be seen, discussed, and compared together is precisely because both companies have chosen to put their work on the table.

The speed of technology diffusion of Chinese open - source models is becoming much faster than before. This may be what the frequent coincidences really indicate.

3

The global technology circle is watching their coincidences

This "coincidence" narrative was of course first invented by the Chinese technology circle. But the overseas developer community is also confirming this in its own way.

After the release of K2.6, Latent Space, one of the most influential newsletters in the AI field, directly placed Kimi in the position of "the leader of Chinese open - source model laboratories after DeepSeek's silent period". A few days later, when V4 was released, the overseas developer community immediately put V4, K2.6, and GLM 5.1 in the same table to compare parameters, prices, context lengths, and Agent capabilities.

The Chinese models used to demonstrate the inference performance of next - generation chips at NVIDIA GTC 2026 are these two.

In the overseas developer community, when people discuss Chinese open - source models, Kimi and DeepSeek are indeed increasingly being put in the same table.

4

What they collide with is not each other

This also makes the relationship between DeepSeek and Kimi a bit delicate. Of course, they are competitors, but in the larger model ecosystem, they jointly push Chinese open - source models to a more prominent position.

The pressure they exert on closed - source models comes not only from a single benchmark but from slower and more fundamental variables such as cost, deployability, open - source weights, and the speed of technology diffusion.

So, is Kimi deliberately colliding with DeepSeek?

Most likely not. The MoE with trillions of parameters needs to be developed, the attention mechanism for long - context needs to be modified, the optimizer for training efficiency needs to be replaced, the adaptation to domestic chips needs to be tackled, and open - source should be done sincerely rather than in a defensive posture. These are not "options" but "necessary paths".

Both companies are seriously working on underlying technologies and have chosen to put key progress in the public context, so they meet at the same crossroads again and again.

It's not that they are too in sync; it's that the road is too narrow.

As for the next "coincidence", it's probably on the way.

If I'm not wrong, Kimi's technical solution to make the text and visual capabilities of large models advance side by side will inspire more Chinese open - source pure - text models to "grow eyes" and see a farther and larger world together.

This article is from the WeChat official account "Silicon Star People Pro", author: Zhou Yixiao, published by 36Kr with authorization.