36Kr Exclusive | A Startup Founded by a "Huawei Genius Teenager" Raises Over 400 Million Yuan in Consecutive Rounds to Develop a New Generation of Inference Chips and Reconstruct Memory Cost
Author | Qiao Yujie
Editor | Yuan Silai
Yingke learned that Beijing Xingyun Integrated Circuit Co., Ltd. (hereinafter referred to as "Xingyun"), a domestic innovative enterprise fully self - developed in GPGPU, announced the successful completion of multiple rounds of Pre - A and Pre - A+ financing, with a total financing amount exceeding 400 million RMB. Wuyuan Capital, SAIF Partners, and Primavera Capital jointly led the investment, and local state - owned assets from Beijing, Jiangsu, etc., as well as industry capitals such as BAW Storage (688525), GSR Ventures, the family offices of founders of well - known GPU companies, and Skyworth Capital followed up. Yunxiu Capital has served for multiple consecutive rounds and will act as the exclusive financial advisor for the next round of financing.
Beijing Xingyun Integrated Circuit Co., Ltd. was established in August 2023. It focuses on the new - generation inference chips for large models and is committed to using non - 3D DRAM architecture to create fully self - developed GPGPU products with ultra - large memory specifications and CUDA compatibility, aiming to promote the popularization of AI large - model inference.
Dr. Ji Yu, the founder of Xingyun, is a doctor from the Department of Computer Science at Tsinghua University and a member of Huawei's "Genius Youth" program. He was deeply involved in the compiler and architecture research and development of Ascend AI chips at Huawei Hisilicon. Dr. Yu Hongmin, the CTO, is a doctor from the Institute of Semiconductors, Chinese Academy of Sciences. He has led the research, development, and mass production of multiple chips such as Baidu Kunlun Chip and Huawei Hisilicon Ascend, with experience in the successful tape - out of more than a dozen chips.
Currently, against the backdrop of the continuous evolution of the large - model architecture, the bottleneck of the computing power system is undergoing a structural change.
Ji Yu said in an interview with Yingke that the current evolution on the algorithm side is reshaping the hardware design logic. Sparse models represented by MoE (Mixture of Experts) have more advantages in computing efficiency but require pre - loading more expert parameters, making the overall demand for memory capacity significantly higher than that of traditional Transformer dense models.
The memory demand of large models (with hundreds of billions/trillions of parameters) has jumped from the GB level to the TB level. In this process, the system cost structure has also been reconstructed - the cost of memory, priced per GB, is gradually exceeding the cost of the computing power chip itself and becoming the dominant factor. Therefore, "the key to cost reduction lies not in computing power but in memory," Ji Yu said.
Based on this judgment, Xingyun has chosen a different technical route from the mainstream: abandoning the high - cost HBM (High - Bandwidth Memory) and instead using lower - cost storage media such as LPDDR and even NAND (SSD particles) as memory media. Through media replacement, the memory cost is reduced by one to two orders of magnitude.
However, low - cost media also means lower single - particle bandwidth. To make up for this shortcoming, Xingyun adopts a multi - particle, multi - channel parallel design in its architecture, increasing the overall bandwidth to the TB level through large - scale stacking to meet the data throughput requirements of large - model inference.
Ji Yu said that with the development of sparsification and the MoE architecture, the absolute demand for bandwidth from the model is decreasing. System design no longer needs to blindly pursue extreme bandwidth but can achieve a balance between cost and efficiency through software - hardware collaboration.
This idea is also reflected in Xingyun's overall technical strategy. Ji Yu emphasized that the company's real scarcity does not lie in single - chip indicators but in system - level design capabilities. Through engineering means such as Prefill/Decode separation (PD separation) and KV Cache sparsification, Xingyun can more flexibly adapt to the rapid changes in AI application forms, from the early Chatbot to the currently emerging Agent scenarios, reducing the risk of technological lag caused by the long chip R & D cycle.
In terms of product verification, Xingyun's previously launched "Brown Ant All - in - One Machine" has tried to build a low - cost inference solution with CPU and general - purpose memory to verify the feasibility of sparse models on non - high - end hardware. Currently, this solution has been implemented in the local deployment scenario of DeepSeek.
(Image source/Enterprise)
Next, the company will focus on promoting its self - developed chips. Ji Yu said that the company's core goal this year is to complete the chip tape - out and bring it to the market as soon as possible, using chip products as the main means for commercialization.
Currently, the phenomenal spread of Open Claw also reveals the huge market demand for consumer - grade hardware to carry high - quality AI. Ji Yu said that Xingyun hopes to use its chip products to truly implement the low - cost, high - quality computing power of trillion - level models on end - side devices such as claw machines, breaking through the current limitation that end - side devices can only run small 100B models and opening up new imagination space for the consumer electronics market.
CTO Yu Hongmin said that Xingyun's design priority has shifted from pursuing the extreme performance of a single chip to pursuing scalability and supply - chain stability from the perspective of the board - level system. Through distributed design and the use of mature processes and low - cost storage, achieving optimal cost and consistent performance experience at the system level is an important foundation for the company to achieve the popularization of computing power.
Views from Investors
Li Gang, the vice - president of Fengrui Capital, said: As an angel - round investor, since Xingyun's team started their business in 2023, they have had a very forward - looking perspective and ideas on AI chips (especially AI chips in the context of large models). In the rapid changes of models and applications in the past three years, we have seen that Xingyun's chip solutions and forward - looking design concepts for the next - generation general large models have been continuously verified, always half a step ahead of the times.
Wuyuan Capital said: Xingyun is a rare "first - principles" thinker in the field of AI chips. Dr. Ji Yu foresaw in 2024 the structural shift of the hardware bottleneck from computing power to memory under the MoE sparse architecture - abandoning HBM and reconstructing the memory cost with LPDDR and even NAND is not a gradual optimization but a promotion of industry paradigm innovation through system - level software - hardware collaborative design capabilities. Since 2026, with the continuous enhancement of AI model coding and agent capabilities, the demand for AI inference has exploded. The phenomenal popularity of agents such as OpenClaw is pulling the inference computing power demand from the cloud to multiple ends and from programmers to the general public. Efficient and low - cost inference capabilities have become an industry necessity. With the exponential growth of future inference demand, Xingyun's technical path will become an important infrastructure for promoting the popularization of computing power.
Jiang Chihua, the managing partner in charge of the technology track at SAIF Partners, said: In the process of the evolution of AI large models to trillion - parameter models, the key to cost reduction lies not in computing power but in memory and system architecture, especially under the premise of limited domestic computing power. Dr. Ji Yu and the Xingyun team have shown a rare system - level engineering vision. They have broken out of the industry's fixed pattern of blindly piling up HBM and reduced the memory and system cost by one to two orders of magnitude through media replacement such as LPDDR/NAND and parallel architecture design, thus achieving the extreme value of single - token cost, which is in line with the direction of industry evolution. We have always focused on the underlying disruptors in the fields of AI and embodied intelligence. Xingyun combines forward - looking architectural innovation with solid implementation capabilities. SAIF is very honored to make a heavy - weight investment in this round. We look forward to Xingyun's new - generation inference chips completely reconstructing the computing power cost model and truly realizing the full popularization of large - model inference in the cloud and at the end - side.
Primavera Capital said: Against the backdrop of the accelerated construction of the domestic computing power ecosystem, Xingyun has keenly focused its design on the reconstruction of memory cost, replacing HBM with LPDDR and NAND and making up for the insufficient single - particle bandwidth through a multi - channel parallel architecture. In essence, it is redefining the cost architecture of inference chips. Dr. Ji Yu has in - depth thinking beyond industry inertia about the evolution direction of AI chip architecture. His judgment that "the key to cost reduction lies in memory rather than computing power" has always been half a step ahead of the industry consensus, and every step has been continuously verified by the market. Dr. Yu Hongmin, an experienced chip veteran who has worked on projects from Huawei Hisilicon Ascend to Baidu Kunlun Chip, has practical experience in every link from chip design to mass production. This combination of "daring to think" and "being able to do" enables Xingyun to have a complete closed - loop ability from architectural innovation to product delivery. We look forward to the successful tape - out of Xingyun's first self - developed chip, which will initiate a new round of cost revolution in AI inference computing power.
Wang Can, the deputy general manager of BAW Storage Technology Co., Ltd.: In the development stage of the evolution of large models from general AI to the Agent form, Dr. Ji Yu has shown extremely forward - looking system - level insights. He accurately identified that the structural bottleneck of large - model inference is no longer just the computing power itself but the lower - cost storage that restricts large - scale implementation. Xingyun's core logic is very clear - by replacing media and innovating architecture, using low - cost LPDDR and even NAND media to challenge the hegemony of expensive HBM is not only a change of physical media but also a fundamental reconstruction of the cost structure of large - model inference. Around this core path, Xingyun has built an excellent software - hardware collaborative design, achieving a balance between performance and cost at the system level through means such as PD separation and distributed expansion. In the chip industry, engineering experience determines the gap from "laboratory architecture" to "commercial mass production." The profound tape - out and mass - production experience accumulated by the Xingyun team in top - level projects such as Ascend and Kunlun Chip is the source of its certainty. In the current era of the explosion of AI Agents, I firmly believe that Xingyun can truly break through the computing power cost and enable high - quality trillion - level models to achieve real computing power popularization.