Exclusive Interview with Ding Tian: Secured Two Rounds of Financing in Two Consecutive Months — What Has ZIcore, the Dark Horse in the AI for Computing Track, Done Right?
36Kr learned that Zizixin Yuan, an AI-driven computing acceleration infrastructure company, recently announced the completion of its Angel + round of financing. This round of financing was led by Dingfeng Kechuang (Wuyuefeng Venture Capital), Inno Angel Fund, and Shoucheng Capital, with oversubscribed follow-on investments from existing shareholders such as Tongchuang Weiye.
This is a star startup with a strong "contrast sense". Against the backdrop of a sharp increase in AI computing demand and dual constraints on chip supply and computing power ecosystem, Zizixin Yuan neither manufactures chips nor sells large model tokens. However, it has completed two rounds of financing in two months, raising nearly 100 million yuan and winning the favor of leading hard technology institutions.
How did this company precisely enter the intersection of AI Infra and the computing power ecosystem? Recently, 36Kr conducted an exclusive interview with Ding Tian, the founder of Zizixin Yuan, in an attempt to uncover the underlying logic behind the company's rapid development.
AI Computing Acceleration: The Next-Generation Computing Infrastructure
The development of human science and industry has always been accompanied by the improvement of computing efficiency. In the past few decades, from general-purpose chips to GPUs and cluster computing, the evolution of hardware architectures has continuously driven the growth of computing power.
However, in the new stage, the core contradiction in the computing power industry has begun to emerge.
Currently, the iteration of large models has led to a rapid increase in AI computing demand. However, limited by the physical limits of the manufacturing process, single-card costs, energy consumption, and the maturity of the domestic chip ecosystem, the supply constraints on the hardware side are becoming increasingly obvious.
At this point of supply-demand imbalance, the key to the next stage of computing power competition is no longer just manufacturing more chips, but effective computing power, that is, the computing performance that a chip can stably release in real business.
During an exclusive interview with 36Kr, Ding Tian, the founder of Zizixin Yuan, mentioned: "In the current domestic computing ecosystem, many customers 'buying chips' does not mean they immediately have 'usable computing power'."
A common phenomenon is that enterprises spend a large amount of money to buy a batch of domestic computing power cards with high theoretical performance. However, when actually deploying business models, the performance achieved may only be 30% - 40% of the theoretical computing power. Insufficient release of computing power directly affects the unit token cost of large model manufacturers and also slows down the delivery cycle of new products and industry applications.
The reason for this gap is not just that the hardware parameters are insufficient.
Currently, a large number of mainstream models and cutting-edge algorithms are originally developed in the NVIDIA CUDA ecosystem. Ding Tian said: "When designing the algorithm architectures of many large models, the goal was to run faster on NVIDIA chips, so they are naturally more compatible with GPUs, and little consideration is given to how to run efficiently on other heterogeneous chips."
When these models are migrated to domestic cards or other heterogeneous platforms, the problem is not simply code translation. It involves the underlying hardware architecture, operator coverage, memory scheduling, communication mechanisms, and compilation optimization. If any link is not connected, the final performance will be affected.
In the past, filling these underlying ecosystem gaps highly relied on scarce senior system engineers for manual adaptation. Engineers needed to understand the model structure, operator implementation, hardware characteristics, and operation feedback, and then gradually improve the performance through repeated parameter tuning, testing, and verification.
This method could solve problems in the early stage, but it is difficult to support the rapidly expanding computing power demand. Tuning experience is often difficult to reuse among different chips, different models, and different business loads. Every time a new type of hardware or algorithm appears, the team may need to do a new round of adaptation.
As Ding Tian said, in the face of emerging new algorithms, "if each adaptation takes several months, computing power operators will find it difficult to maintain market competitiveness."
This is not a problem that can be solved by engineers working harder to write code. What Zizixin Yuan aims to solve is to transform computing power adaptation from a manual workshop-style manual delivery into a set of reusable, automatically searchable, and continuously convergent engineering pipelines, so that the theoretical computing power can be converted into effective computing power in real business as much as possible.
For Zizixin Yuan, the adaptation of the domestic computing power ecosystem is a key entry point for verifying its technical paradigm.
In the long run, it aims to build a general computing acceleration layer that can span models, frameworks, compilers, and hardware. Whether serving domestic chips or adapting to international mainstream heterogeneous hardware, it can complete scheduling and optimization in an automated manner.
The core value of the "next-generation computing infrastructure" defined by Zizixin Yuan lies in reducing the adaptation cost between computing tasks from models to hardware through an AI-driven automated paradigm.
On the one hand, through automated tuning, the delivery cycle of models on various heterogeneous computing powers has the opportunity to be compressed from months to a shorter period. For model manufacturers, cloud providers, and industry customers, this directly affects the speed of model migration, private deployment, and new business launch.
On the other hand, when computing efficiency is improved and costs are reduced, scenarios that were previously difficult to implement due to high computing power costs or performance bottlenecks, such as complex scientific computing and high-precision industrial simulation, will have new implementation opportunities.
Letting AI take over part of the optimization work of the computing system is a new solution for the leap in computing efficiency, and this is exactly the area where Zizixin Yuan enters.
Equipping the Computing System with "Autopilot" in the Needle-in-a-Haystack Solution Space Using AI and Operations Research
Based on a deep understanding of the pain points in computing power, Zizixin Yuan positions itself as an "AI for Computing" company, committed to reconstructing the computing acceleration infrastructure through AI-driven automation technology to achieve a systematic improvement in computing efficiency.
In Ding Tian's view, the core challenge of computing acceleration is whether it is possible to find the optimal implementation path under the physical constraints of a specific chip.
This is an extremely complex solution space. The number of possible implementation paths for a specific computing task on a particular chip can be astronomical. How data is divided, how memory is scheduled, and how hardware parallelism is configured - even a small change in each dimension can lead to completely different performance results. More importantly, these dimensions are highly coupled and vary greatly with different chip architectures, and there is no fixed rule to solve all problems.
This is like planning the fastest route in a constantly changing city. The roads, traffic lights, traffic flow, and destinations are all changing. The route that worked yesterday may not be the best today. The same is true for computing acceleration. The real difficulty lies in finding the best implementation method under constantly changing constraints.
Therefore, computing acceleration is essentially an optimization problem under complex hardware constraints.
This also explains why Zizixin Yuan needs to integrate three layers of capabilities simultaneously: large models are good at understanding computing requirements, identifying performance bottlenecks, and quickly generating candidate solutions, but they cannot accurately solve problems in high-dimensional parameter spaces; operations research optimization algorithms can precisely make up for this shortcoming and efficiently approach the optimal solution in a large number of parameter combinations; and algorithmic automatic discovery further enables the system to have the ability to independently explore new computing strategies, rather than simply reusing known experience. Only through the collaboration of these three can the performance limit of the chip be truly approached.
To solve this problem, Zizixin Yuan has established a core technical paradigm of "large model + operations research optimization + algorithmic automatic discovery".
Ding Tian made an analogy to 36Kr. Taking solving a complex physics problem as an example, the large model is the person responsible for reading the question and determining the problem-solving direction. It can understand what the question is asking, identify the known conditions and constraint relationships, and provide a general problem-solving idea. However, to calculate the accurate result, the problem needs to be transformed into a solvable mathematical model, and then searched and verified step by step through operations research optimization algorithms. If the direction is judged incorrectly, the subsequent calculations are meaningless; if only the direction is determined, the problem cannot be truly solved.
In the system built by Zizixin Yuan, the large model is responsible for understanding computing requirements, identifying performance bottlenecks in the code, and formulating preliminary computing strategies and code. Subsequently, the operations research optimization algorithm takes over the specific parameter configuration and scheduling optimization work.
Through continuous on-board verification and testing on the hardware, the two continuously iterate and converge in the "hardware-in-the-loop" feedback mechanism, and finally lock in the optimal solution that can maximize the performance of the chip.
This new technical route is equivalent to equipping the computing system with "autopilot". It breaks the traditional development path that relies on engineers to manually write code, manually tune parameters, and repeatedly make trial and error, enabling the underlying system to have the ability to independently explore algorithm implementation, control resource scheduling, and automatically deliver computing acceleration solutions.
KernelCAT: The Intelligent Acceleration Engine for Building the Computing Power Base
Theoretical feasibility ultimately depends on the delivery effect.
Based on the technical paradigm of "large model + operations research optimization + algorithmic automatic discovery", Zizixin Yuan has launched its core commercial product - the computing acceleration intelligent agent KernelCAT, which is defined as the core infrastructure in the computing era.
After the computing acceleration task is taken over by KernelCAT, relying on its control over the underlying computing system, it will first independently establish a global understanding of the task. KernelCAT can penetrate the surface requirements, understand the essence of the model architecture and the logical context of the computational graph, and locate the real bottleneck of the business load. Combined with its understanding of the microarchitecture of the target hardware, it can independently deduce the globally optimal performance evolution path under the complex constraints of latency, throughput, and power consumption.
After completing the global deduction, KernelCAT will seamlessly connect and transform the high-level strategy into in-depth underlying execution. It directly delves into the instruction set architecture level, dynamically generates extreme computing code for the task target, and establishes a closed-loop verification to ensure logical accuracy. Facing the complex parameter configuration and scheduling logic, it reconstructs them into a solvable high-level mathematical model, and uses operations research optimization algorithms to find the optimal solution in the billions of combination spaces.
Theoretical computing power must be tested in the real physical world. KernelCAT will independently project the candidate strategies onto the target chip and capture microscopic performance data such as time consumption, video memory read and write, and computing unit utilization. This is a dynamic game: once it detects a memory access bottleneck, the system will adjust the data partitioning and scheduling strategies; if the compilation or execution results are not ideal, it will spontaneously return to the previous link to regenerate, verify, and converge.
At this point, "analysis - coding - on-board tuning - delivery" has been reshaped by KernelCAT into a fully automated intelligent closed-loop, with each link closely connected. The work that previously relied on top engineers to repeatedly make trial and error is now autonomously controlled by the intelligent agent.
More importantly, in each process of extreme optimization, it is "self-evolving" - continuously precipitating the top experts' tuning intuition and complex software and hardware constraint rules into the system's underlying knowledge, and transforming into a core asset library that becomes smarter with use and can be scaled and reused.
As of now, KernelCAT has completed the automated tuning of multiple types of heterogeneous operators and achieved good test results.
Taking the migration of the attention operator in the vLLM framework as an example, this operator itself has a high optimization difficulty. Zizixin Yuan used KernelCAT to automatically complete the high-performance migration from GPU to Ascend NPU. On the premise of ensuring 100% accuracy alignment, the running time was compressed from 132 microseconds to 10.6 microseconds, achieving a 12-fold performance improvement. The relevant results have been included in the Ascend official Triton operator library.
In model and scenario-level delivery, KernelCAT can also support the smooth switching of multiple types of complex business loads. In the real production environment, making a single operator run faster is only part of the problem. The more difficult part is whether different architectures can be stably adapted and continuously delivered.
This method has also been applied to edge-side and embodied intelligence scenarios. Compared with cloud-based large models, edge-side computing tasks are more fragmented, and the hardware constraints are more specific, requiring higher adaptation efficiency and performance stability.
Tuning the edge-side embodied intelligence model is a bit like putting a complex production line into a small workshop. The space is limited, the process cannot be chaotic, the accuracy cannot be reduced, and the speed needs to be faster. Taking the deployment and tuning of the Pi0.5 VLA embodied intelligence model on the Ascend 310P1 development board as an example, KernelCAT completed the basic deployment in 1 day and automatically implemented full-stack optimizations such as "empty camera cropping", "KV cache reuse", and "D2D zero copy" within 1 week. On the premise of maintaining a high accuracy of 99.9999%, the end-to-end inference performance was improved to more than twice that of the optimal community implementation.
From operator migration to edge-side model deployment, KernelCAT is actually dealing with the same type of problem. It connects computing tasks, optimization processes, and real hardware feedback, enabling the underlying engineering that was previously completed through manual tuning to enter a computing acceleration pipeline that can run, reuse, and iterate automatically.
From Computing Power Adaptation to a General Computing Acceleration Platform
Currently, computing demand is growing rapidly, while computing power supply is becoming more and more fragmented. Chip manufacturers, cloud providers, model manufacturers, and government and enterprise customers focus on different indicators, but they all ultimately point to the same problem, that is, how to make the computing power of different architectures easier to call and release higher efficiency in real business.
In this context, helping the computing power ecosystem solve the underlying engineering shortcomings is only the first step for Zizixin Yuan to enter the market, and it is also a more convincing verification scenario for its automated tuning technology. However, the business nature of Zizixin Yuan is not limited to filling the gaps in a single ecosystem. As the acceleration engine driven by AI and operations research becomes increasingly mature, it is becoming an engineering capability that connects computing power supply and model applications, providing a standardized computing power delivery method for the industry.
With the continuous growth of global computing demand, heterogeneous computing has become an irreversible trend. In the future, various GPUs, NPUs, TPUs, and dedicated inference chips will jointly form the underlying computing power pool. In the era of highly coupled software and hardware, the real challenge the industry faces is how to break the adaptation barriers between different hardware architectures, enabling upper-layer applications to stably and efficiently call the underlying computing power without being repeatedly restricted by specific chips and software stacks.
The long-term goal of Zizixin Yuan is to build a general acceleration layer across multiple hardware architectures through engines such as KernelCAT, advancing computing acceleration from customized services for specific chips to a more general computing infrastructure.
To achieve this system-level vision, the challenges are still great.
AI for Computing has long been in an uncharted area. Traditional methods are prone to getting stuck when facing extremely high-dimensional operator spaces. Zizixin Yuan chooses to go deep into the engineering front line, breaking down complex computing power bottlenecks into problems that can be searched, verified, and iterated, and then advancing cutting-edge computing algorithms into real hardware and business scenarios through high-frequency closed-loops.
The ability to dive into the deep end of technology stems from the "academic + engineering + business" triple background of the Zizixin Yuan team.
On the academic side, they have the "brain" for "optimization". Relying on the Shenzhen Big Data Research Institute and with the support of Academician Luo Zhiquan, a famous optimization theory expert, the team masters the underlying mathematical methods for finding the "optimal solution" in billions of operator combinations.
On the engineering side, they have the "hands" to implement. The backbone of the team has extensive experience in the front line of infrastructure such as Huawei, and they are well aware of the limitations of theoretical algorithms. They know how to make models, compilers, and underlying chips truly work together to achieve extreme performance in real business loads.
On the business side, they insist on "building machines" rather than being "craftsmen". From the beginning, Zizixin Yuan has been restrained in earning "headcount fees" and has refused traditional customized manual tuning services. They precipitate every pitfall they have encountered and every extreme optimization experience into intelligent products such as KernelCAT. This means that the final result they deliver is an automated acceleration engine that can be reused, verified, and scaled for delivery.
Academia provides methods, engineering provides implementation, and business orientation shows restraint. The combination of these three forms a solid team barrier for Zizixin Yuan in the AI for Computing track.
Every leap in computing efficiency stems from the reconstruction of the underlying infrastructure. From code, software stacks to computing power systems, humans have always been promoting one thing: continuously reducing the threshold for using complex computing, making computing power from an ability that only a few experts can master into a productive force that can be automatically called and continuously optimized. Zizixin Yuan positions itself in the system adaptation link that is the most difficult to notice but directly determines whether computing power can be implemented, replacing traditional manual tuning with AI automation and