Nearly 1 billion yuan in financing secured, MoXin accelerates the commercialization closed-loop of sparse computing and will launch a new generation of accelerator cards by the end of the year.
On May 28th, Zhidx reported that today, domestic AI chip startup MoXin AI announced that it has recently officially completed a Series C financing of nearly 1 billion yuan. This is also the latest significant financing for domestic AI chip companies.
The financing lineup for this round is diverse, bringing together leading state-owned venture capital institutions such as Shenzhen Capital Group, as well as industrial capital and market-oriented institutions like Yanshan Technology, Greater Bay Area Common Home, Liding Capital, and Yunsheng Capital. Old shareholders such as Triumph Venture Capital, Chuangxiang Investment, and Shengjing Jiacheng and other institutions continued to follow up with investments.
Meanwhile, MoXin also announced the progress of its new product launch. By the end of this year, its new generation of high-performance AI general-purpose inference card, SparsePrime, will be released.
As one of the few domestic startups taking a differentiated sparse computing route, MoXin has introduced several AI computing cards based on its self-developed sparse computing chips, establishing a dual-driven model of "self-developed technology in-depth + customer scenario implementation."
On the hardware performance side, MoXin's AI accelerator cards have won the championship three times in the global mainstream AI benchmarking set MLPerf. The computing power performance of MoXin's S30, S40 single cards, and multi-card clusters all exceed that of NVIDIA's A100 and H100.
In terms of commercial implementation, MoXin has won many commercial orders in cloud, edge, and computing power network sides, and has completed the deployment of kilocard clusters in multiple regions across the country.
The upsurge of AI computing power continues to heat up, and capital is pouring into domestic AI chip startups. Facing the industry trends of domestic substitution and accelerated commercial implementation, MoXin has taken the lead in establishing a mature commercial closed-loop system.
At this critical juncture, Zhidx had an in-depth exchange with Wang Lvyu, the secretary of the board of directors and the general manager of the corporate development and capital market department of MoXin, and Shang Yong, the vice president of commercialization at MoXin, trying to find the secret behind MoXin's breakthrough in the domestic differentiated computing power track.
01.
Diverse investors enter the game
The new generation of computing cards will be launched by the end of the year
The dividends of the computing power era are accelerating, and capital continues to flow into the domestic AI chip track. The IPO rhythm of the industry is accelerating. In the midst of this wave, MoXin is accelerating on both the capital and product fronts.
Currently, AI applications are accelerating their implementation, and the structure of the industry's computing power demand has undergone fundamental changes. According to calculations by multiple industry institutions, the current domestic demand for inference computing power has climbed to 10 to 15 times that of training computing power, and the inference track has become the new main battlefield for computing power competition. At the same time, the construction of the computing power network, which is the core foundation of the digital economy, is also accelerating comprehensively.
Targeting the current computing power dividends and the incremental market, MoXin's strategic layout is clear.
Wang Lvyu revealed that MoXin's new round of funds will be fully invested in the new generation of AI inference computing cards and the deployment of the computing power network.
On the one hand, MoXin will launch a new generation of SparsePrime computing cards within this year. This product is targeted at intelligent computing centers and data center scenarios. It is a high-performance AI general-purpose inference card, built on its self-developed Antoum 2.0 chip architecture, and is deeply compatible with large models and various complex inference tasks.
MoXin's new round of funds will cover all aspects from product R & D, mass production to market implementation of the new generation of computing cards, providing sufficient capital reserves for subsequent market expansion.
SparsePrime is MoXin's latest answer. Facing the current deployment requirements of large models, it is suitable for mainstream Transformer models, allowing customers to obtain sparse acceleration quickly with zero acceptance cost. At the same time, developers can migrate and directly deploy and run based on the model codes of PyTorch and TensorFlow, as well as efficient inference frameworks such as vLLM, with almost zero code modification. It also supports developers to use the Triton language for custom operator development to lower the usage threshold.
On the other hand, MoXin has completed deployments in data centers in the four major regions of Northwest, Southwest, East, and North China, and has achieved large-scale applications in multiple industry scenarios and fields. In the next 1 - 2 years, MoXin will complete a more extensive computing power network layout covering the entire eastern, central, and western regions of the country.
The completion of this financing and the upcoming launch of the new generation of computing cards are a concentrated manifestation of MoXin's phased achievements.
Meanwhile, MoXin is also building a sparse ecosystem, strengthening industry - university - research cooperation, and creating a developer community to pave the way for its product expansion.
With a three - pronged layout of cloud - based models, vertical customers, and computing power networks, MoXin firmly occupies the core position in the computing power track.
02.
The performance of AI accelerator cards exceeds that of mainstream GPUs
Real - world scenario verification and implementation closed - loop have been achieved
Looking at the entire industry, the AI computing power system is accelerating its differentiation and iteration. In the future, the core competition of AI chips will no longer be a simple superposition of computing power and power consumption. Computing power utilization is the real core barrier.
As a representative player breaking through in the domestic AI chip field through underlying technology innovation, MoXin's self - developed dual - sparse computing technology has broken through the computing power bottleneck and created a high - computing - power, low - power - consumption, and high - cost - performance computing power solution.
The principle of sparse computing can be compared to the human brain. When people process different tasks, they activate the corresponding areas of the cerebral cortex without invoking the entire neural network. In the case of chips, it means sparse activation for different tasks. Simply put, sparse computing is to reduce the redundancy of neural network models through underlying innovation and software - hardware collaborative design to improve computing efficiency.
MoXin's differentiated barrier lies in that it does not simply prune at the algorithm level. Instead, from the perspective of software - hardware collaboration, it integrates sparse computing into the chip computing core to optimize its performance.
This has enabled MoXin's AI accelerator cards to win the championship three times in a row in the global authoritative AI benchmarking MLPerf. In specific scenarios, the peak throughput performance of its S40 is 2.9 times that of the A100 and 1.4 times that of the H100, leading in single - card performance. In multi - card computing power performance, 4 S30 cards are 1.8 times higher than 4 H100 cards, and even 1.2 times higher than the performance of an 8 - card A100 cluster.
Now, MoXin has achieved full - chain self - development from chips, computing cards to industry solutions, which has also become the foundation for it to bring differentiated value to customers.
However, technological breakthrough is only the first step. The more difficult part is to penetrate into the scenarios. Shang Yong mentioned that the path MoXin is taking is not just a pure technical problem. It also requires repeated verification and trial - and - error in various business scenarios. Therefore, in essence, it is a complex engineering implementation problem.
He gave an example. In a computing power cluster project that MoXin participated in last year, the customer was a manufacturing enterprise in a non - AI field. The enterprise's core concern was straightforward - to obtain a computing power solution with a better energy - efficiency ratio within a limited budget. Through analysis, MoXin's R & D personnel found that the overall solution they created for the cluster could reduce the cost by 30% - 50% compared with solutions on the market and could meet the enterprise's specific scenario requirements.
Only through such in - depth technological exploration and scenario refinement can the value of computing power be fully released.
Shang Yong summarized MoXin's strategic focus as "walking on two legs." On the technology side, it relies on the exclusive sparse computing architecture to break through the computing power bottleneck and significantly improve computing power utilization efficiency and hardware density. On the market side, it focuses on the golden track of AI inference. Relying on the strong correlation between inference business and industrial application scenarios, it collaborates with customers to complete chip customization and large - scale commercialization, achieving a deep binding of technology and scenarios.
Now, MoXin has formed a gradient computing power layout such as S4 - S40, which can meet the computing power requirements from traditional small models to small - sized large models and then to the iteration of ultra - large - scale models. The new product it launched this year is designed for the potential computing power requirements of ultra - large - scale models in the next 3 or even 5 years.
It can be seen that what can truly reconstruct the computing power cost and break through the performance ceiling should be based on the innovation of the underlying architecture - level computing paradigm.
03.
Targeting the rigid demand for AI cost reduction
MoXin has won commercial orders in multiple fields
As the AI industry moves from technology experimentation to industrial popularization, cost reduction, efficiency improvement, adaptation, and profitability have become important criteria for testing the value of AI chips. MoXin, which is in line with this trend, already has the scalable, profitable, and replicable commercial implementation ability.
Breaking it down, the efficiency improvement is reflected in the fact that MoXin's computing cards have won the championship twice in mainstream tests. In terms of cost reduction, the current focus is on the cost per token.
Data from the National Data Bureau shows that at the end of March this year, the daily average Token call volume in China exceeded 140 trillion, more than a thousand - fold increase compared with the daily average call volume at the beginning of 2024. Facing the huge call demand, sparse computing can streamline computing threads while ensuring the model accuracy remains unchanged. In actual business scenarios, the per - token operating cost of MoXin's solution is much lower than that of mainstream GPU products, and it has more advantages in some specific scenarios.
Based on this, Wang Lvyu revealed that MoXin has won many commercial orders in the cloud, edge, and computing power network sides and expects to achieve break - even in the next one or two years.
MoXin's rapid growth essentially conforms to the core transformation trend of the AI industry. It adheres to the core route of sparse computing in the changing and unchanging industry landscape and seizes the dividends of the trillion - level market.
On the one hand, the value of MoXin's persistent sparse computing technology is being released. The necessary condition for the implementation of AI applications is the gradual decline of marginal costs, which coincides with MoXin's computing route. They adhere to continuously reducing application costs through technological innovation without blindly chasing other leading technologies.
On the other hand, the development of the AI industry is changing rapidly. MoXin needs to accelerate the iteration of product forms to create a general - purpose product system that can meet the differentiated needs of various customers and scenarios. With the continuous emergence of various models, no enterprise can accurately predict their development trends. Therefore, in Shang Yong's view, MoXin always starts from the customers' needs to ensure that the products can meet their requirements at the design stage.
Looking at the entire AI field, the commercialization models of model manufacturers are basically taking shape. Cost reduction has become a rigid demand across the industry. Coupled with the gradual weakening of GPU hegemony, domestic multi - path AI chips are entering a period of commercial explosion, and the dividends of the track where MoXin is located are continuously being released.
04.
Conclusion: The explosion of AI inference demand
Sparse computing rides the wave
The AI inference market is experiencing an explosion. In September last year, Jensen Huang, the founder and CEO of NVIDIA, said in an interview with foreign media that the growth of AI inference is not 100 times or 1000 times, but 10 billion times.
Targeting this industry trend, MoXin's sparse computing technology has inherent advantages. This technology is naturally suitable for the efficiency improvement and cost reduction requirements of inference scenarios. In addition, MoXin has built a differentiated barrier of software - hardware collaboration, scenario verification, and ecological improvement with this as the core.
It is certain that with the explosion of inference demand, the sparse computing track has officially entered the explosion window period.
This article is from the WeChat official account "Zhidx" (ID: zhidxcom), author: Cheng Qian, editor: Mo Ying. It is published by 36Kr with authorization.