The Emergence of GPU's "Little Four Dragons": Players like Cambricon Are No Longer Alone

Who can bet right on the next five years?

The "Four Little Dragons" of GPUs are about to assemble in the capital market.

On June 15th, Shanghai Enflame Technology passed the listing review on the STAR Market of the Shanghai Stock Exchange. According to the prospectus, Enflame Technology plans to raise 6 billion yuan in this IPO. Among them, 3.3 billion yuan will be invested in the AI software-hardware collaborative innovation project, 1.2 billion yuan will be used for the R & D of the sixth-generation chips, and 1.5 billion yuan will be for the R & D of the fifth-generation chips.

The capital jigsaw puzzle of domestic GPUs is being completed. The "Four Little Dragons" - Moore Threads, Magic Core Technology, Biren Technology, and Enflame Technology - are about to assemble in the capital market.

This is an accelerated moment.

From December 2025 to June 2026, in just half a year, at least 6 AI chip companies have landed or are about to land in the capital market. If we add Cambricon, Hygon Information, and Days Intelligence, which were listed earlier, the total market value of the domestic GPU legion is approaching 2 trillion yuan.

What's more worthy of attention is the value behind the numbers.

When Moore Threads achieved a book profit of 29.35 million yuan in the first quarter, Magic Core Technology narrowed its losses by 57.7% and clearly set a schedule to reach the break - even point in 2026. Different numbers point to the same direction:

Domestic GPUs are shortening the distance from technological breakthrough to a positive business cycle at an unprecedented speed.

The DeepSeek Moment of Domestic GPUs

On April 24, 2026, DeepSeek released its flagship model with one trillion parameters, DeepSeek - V4.

Different from a year ago when the industry was still arguing about "whether domestic chips can run large models" when V3 was released, this time, many domestic AI chips, including Huawei Ascend, Cambricon, Hygon, Magic Core, Moore Threads, Kunlunxin, Alibaba's T-Head Zhenwu, and Days Intelligence, completed the adaptation on the day of the model release.

What DeepSeek - V4 brings to domestic chips is far more than just a technological adaptation. It has changed the market's expected coordinate system for domestic computing power.

Previously, the default framework for evaluating an AI chip was what percentage of the performance of NVIDIA's same - generation products it could reach. This placed domestic chips in the position of a follower.

However, the practice of DeepSeek - V4 provides a new perspective. Zhang Dixuan, the president of Huawei's Ascend Computing Business, revealed that the single - card computing power of Huawei's AI training and inference acceleration card Atlas 350 has reached 2.87 times that of NVIDIA's H20.

When a model with one trillion parameters can run stably on domestic chips, benchmarking against NVIDIA's most powerful cards is no longer the only selection criterion.

This change in perception is being translated into real money. Market research firm Bernstein Research predicts that by 2026, NVIDIA's share in the Chinese AI chip market will plummet from 95% three years ago to 8%. Huawei will occupy 50%, AMD about 12%, and Cambricon will rank third.

In the competitive landscape, the overall share of domestic AI acceleration cards has exceeded 60%. This is a historic reshaping of the landscape. The barriers that were considered insurmountable three years ago are being rapidly broken down by domestic chips.

The rise of the "Four Little Dragons" of GPUs cannot be ignored either.

On the evening of March 30, 2026, Moore Threads' Kua'e Computing Cluster won a large - scale order worth 660 million yuan. The announcement shows that the contract amount of this single order is equivalent to 55% of Moore Threads' annual revenue in 2024.

This means that Moore Threads has overcome the engineering barriers of ten - thousand - card clusters and has moved from chip manufacturing to the delivery of ultra - large - scale computing clusters.

Enflame Technology, which is sprinting for the STAR Market this time, benefits from its close ties with leading enterprises.

At Tencent's full - year performance press conference in 2025, President Liu Zhiping disclosed that Tencent's investment in new AI products in 2025 was about 18 billion yuan, and it plans to at least double this investment to over 36 billion yuan in 2026.

The explosion on the demand side has just begun, and Enflame's share in it is continuously expanding. In the first quarter of 2026, Enflame's revenue was 287 million yuan, a year - on - year surge of 1474%.

Currently, the window is still expanding.

Take Biren Technology as an example. Its revenue in 2025 was 1.035 billion yuan, a year - on - year increase of 207%. It has customers covering national - level computing platforms, telecom operators, and large AI model companies. A gross profit margin of 53.8% indicates that its products have sufficient bargaining power in the market.

Behind this is the market window opened by DeepSeek - V4. From the surge in orders for Huawei Ascend to Cambricon's turnaround from losses to profits, and the Day0 adaptation of 8 domestic chips, domestic chips can already carry the production - level inference load of top - level large models.

Differentiated Breakthroughs via Multiple Paths

If we use only one indicator to measure the gap between domestic GPUs and NVIDIA, the most appropriate one is not chip computing power but time.

NVIDIA's CUDA ecosystem has been accumulating for 20 years. It has 4 million developers globally and is the default adaptation for most mainstream AI frameworks worldwide, forming the moat of its chip empire. For developers to move out of the CUDA ecosystem, the cost is not only money but also years of code accumulation, debugging habits, and tool - chain dependencies of a team, which are the muscle memories of developers.

However, what's more worthy of attention is that domestic GPU companies are using multiple paths to bypass NVIDIA's solutions in a much shorter time than 20 years.

The first path is compatibility, which is the path taken by Moore Threads. The software stack of its self - developed MUSA architecture is highly compatible with the CUDA ecosystem, aiming to help developers migrate their applications from the NVIDIA platform with the lowest migration cost.

In other words, Moore Threads provides a low - friction switching channel for a large number of CUDA users. On May 18th this year, at Moore Threads' Beijing annual press conference, Zhang Jianzhong, the founder of Moore Threads, directly said:

"The goal of MUSA has never been to be a substitute for CUDA, but to allow CUDA developers to seamlessly migrate to domestic platforms and truly achieve plug - and - play."

The second path is to bypass. Huawei Ascend and Enflame Technology adopt the Domain - Specific Architecture (DSA), which is a chip specifically customized for AI training and inference, without pursuing general capabilities such as graphics rendering.

The core idea of this path is to be specifically designed for AI. By designing dedicated computing units, such as matrix computing units and vector computing units, in the chip for high - frequency AI training scenarios, resources are concentrated to optimize the hardware for AI computing, thereby achieving higher efficiency and lower power consumption than general - purpose GPUs in AI scenarios.

For example, the single - card performance of Huawei Ascend 950PR surpassing NVIDIA's H20 is the best footnote to the advantages of the DSA route.

Enflame Technology's development is particularly typical. It breaks the model of making standard chips and waiting for customers to purchase. Instead, it actively collaborates closely with model providers. Tencent puts forward requirements, and Enflame makes extreme targeted optimizations. Previously, Enflame Technology's three - generation chips have been adapted and launched in hundreds of business scenarios within Tencent, covering everything from WeChat voice - to - text conversion to Tencent Meeting minutes, from advertising recommendation to content review.

This strategy has indeed achieved results within the Tencent system. Enflame Technology's revenue jumped from 301 million yuan in 2023 to 990 million yuan in 2025, with a compound annual growth rate of up to 81.32%.

Biren Technology chooses a hardware - software integration model. Its intelligent computing solution provides self - developed chips, boards, servers, and even a complete intelligent computing cluster. It also includes a self - developed BIRENSUPA software platform, which has a complete software stack including compilers, operator libraries, and communication libraries and is compatible with mainstream AI frameworks. At the system level, Biren provides the ability to deliver ten - thousand - card clusters.

A set of data can confirm the strength of this combined model. In 2025, its revenue from intelligent computing solutions was 1.028 billion yuan, accounting for over 99% of the total revenue.

To summarize the growth path of domestic GPUs, it can be put in one sentence: Beyond single - card capabilities, build their own ecological moats - from general compatibility to specialized efficiency, from chips to solutions, from large models to scientific computing, players are making all - out efforts in every dimension.

From Substitution to Native

The current Chinese AI chip market is being reshaped from a unipolar pattern dominated by NVIDIA with others following to a multipolar battlefield with "sufficient + cheap + controllable" as the new coordinate system.

According to data from institutions such as IDC, the total shipment of Chinese AI acceleration cards in 2025 was about 4 million. Among them, NVIDIA shipped about 2.2 million, and its market share dropped from a peak of 95% to about 55%. In the same period, domestic manufacturers shipped about 1.65 million in total.

In this round of reshuffle, a clear echelon has been formed in the domestic camp. Led by Huawei Ascend with a shipment of 812,000, multiple strong players such as Alibaba's T - Head, Baidu's Kunlunxin, and Cambricon have emerged, ending NVIDIA's solo performance.

In March this year, a paper published by the Ling team led by He Zhengyu, the chief technology officer of Ant Group, showed that using an optimized low - specification hardware system, the cost of training one trillion tokens can be reduced from 6.35 million yuan to 5.08 million yuan, a reduction of about 20%.

In other words, domestic chips can already support the training of cutting - edge models without NVIDIA's advanced chips.

According to CITIC Securities' prediction, by 2026, the scale of the Chinese domestic AI chip market will exceed 300 billion yuan. The explosion of large - model training and inference demand, the construction of intelligent computing centers and the improvement of enterprise AI penetration rate, and the entry into the critical stage of domestic substitution - these three engines will drive the market share of domestic GPUs in the inference market to exceed 40% and in the training market to exceed 25% around 2028.

More crucial changes are taking place at the structural level. In 2026, the dual pattern of "in - depth development in the cloud + explosion at the edge" in the AI industry is taking shape. In the edge field, the implementation of scenarios such as industrial Internet, autonomous driving, and digital twin has entered an explosive period. A large number of edge AI nodes with huge quantities, fragmented scenarios, and extreme sensitivity to power consumption and cost will see an explosion in demand.

This type of demand is not NVIDIA's comfort zone but a big cake for domestic GPUs, not snatched from NVIDIA but left by it.

Looking deeper, official data from DeepSeek shows that the computing power utilization rate of domestic chips has increased from the industry - wide 60% to 85%, and the inference cost can be reduced to one - third of NVIDIA's solution.

In other words, leading projects have verified that the closed - loop of domestic chips + domestic models + domestic cloud can work.

However, this does not mean that the opportunity window will always be open.

NVIDIA's Blackwell and Rubin series are still being iterated, and the locking effect of the CUDA ecosystem has not loosened.

Can domestic GPUs enter the deep - water area of the software ecosystem and build a complete native software stack including a developer community? Can they make up for the process difference with architectural innovation and break through the ceiling of advanced computing power? Can they move from project delivery to platform - based delivery and from one - by - one business to general operation?

These checkpoints determine whether domestic GPUs can move from the substitution narrative to the native one. Currently, Enflame Technology's IPO and the gathering of the "Four Little Dragons" in the capital market are just the beginning. In the future, running through the profit model and incubating their own ecosystem will be a new chapter for domestic GPUs.

This article is from the WeChat official account "Market Value Crystal", author: Editorial Department, published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The "Little Four Dragons" of GPUs have emerged, and players like Cambricon are no longer alone.

The DeepSeek Moment of Domestic GPUs

Differentiated Breakthroughs via Multiple Paths

From Substitution to Native