Successfully Secured Tens of Millions in Two Rounds of Financing from Matrix Partners and Fengrui Capital. A Post-2000 Team from Tsinghua University Aims to Alleviate Token Bill Anxiety|Exclusive Report by Intelligence Emergence
Text by | Wang Xinyi
Edited by | Deng Yongyi
The Wange Zhiyuan team has these labels: post - 2000s, a team of doctors, and those who focus on technology.
CEO Wang Guanbo happens to fit all these descriptions. He is currently a doctoral student in the Department of Computer Science at Tsinghua University and a serial entrepreneur born in the 2000s.
The team is quite young, with a scale of about 20 people. Nearly 90% of the members are post - 2000s, and most of them are master's and doctoral students from universities such as Tsinghua and Peking University. There are also members from companies like Amazon, OpenAI, and ByteDance.
According to exclusive information obtained by "Intelligent Emergence", recently, Wange Zhiyuan successfully completed two rounds of angel and angel + round financing worth tens of millions of yuan, with participation from Five - Source Capital and Fengrui Capital. Yuanhe Capital served as the exclusive financial advisor. This round of financing will be used for product R & D and market promotion.
In the past, cloud - based computing power was almost a must - have option. With the explosion of the capabilities of agents such as Claude Code, Codex, and OpenClaw, the demand for tokens has also witnessed an explosive growth.
Wang Guanbo admitted that all the inference engines on the market are not very suitable for the edge side. Most of the existing inference engines focus on improving speed while ignoring the huge memory consumption.
On the edge side, the memory provided by chip manufacturers usually does not exceed 32GB. If the memory is too large, its usage scenarios will be limited. Therefore, for manufacturers, their requirement is to enable their chips to perform inferences faster, support larger models, and not increase the hardware cost under the existing memory conditions.
Based on this, Wange Zhiyuan has proposed a solution of the edge - side computing power engine cPilot + the intelligent platform Amis, allowing users to use affordable and useful tokens:
In terms of cost, it enables small - memory machines to run larger models, greatly reducing the hardware cost required for model deployment; in terms of performance, it targets edge - side large models rather than small models and provides a local model deployment solution that can meet users' needs.
"Under the same memory overhead, some solutions can only run the model in a low - memory environment by sacrificing speed, accuracy, etc. In contrast, our edge - side inference solution is at least 12 times faster," Wang Guanbo told "Intelligent Emergence".
In 2025, they spent almost the whole year ensuring the compatibility of their products with the chips of various manufacturers. At that time, there was no strong demand for edge - side intelligence from the C - end.
This year, the popularity of tools like OpenClaw has made them see the possibility of the To C market.
Wang Guanbo introduced that currently, the main customers of Wange Zhiyuan are B - side chip manufacturers. They cooperate with them to develop terminal hardware, install their edge - side computing power engine and the self - developed Lobster product locally on products such as AI mini PCs, AI PCs, or AI NAS, and provide a set of edge - side computing power optimization solutions. They also pre - install a platform that can deploy models with one click and aggregate APIs to meet the C - end customers' demand for local deployment of large models.
At present, Wange Zhiyuan's business model is mainly focused on B - side business. With the practice of B to C, they are gradually verifying and establishing the C - side business model.
Currently, Wange Zhiyuan's cooperation with several hardware manufacturers has entered the delivery stage. It is expected that tens of thousands of devices will be pre - installed and shipped this year. The company expects to have revenues of over ten million yuan this year.
Don't Do Edge - Side Small Models
In the current large - model market, the price war is in full swing.
Recently, DeepSeek announced an adjustment to the API price of DeepSeek - V4 - Pro, with a direct reduction of 75%. Lei Jun also announced a price cut for the MiMo V2.5 series of models, with a maximum reduction of up to 99%.
The consensus behind this is that AI has truly entered many productivity scenarios, and users' demand for using good models at low cost is increasing.
Wange Zhiyuan shares the same idea. They target the capabilities of edge - side hardware, allowing users to use large - parameter models locally, fundamentally solving the cost problem - apart from the hardware cost, the token cost is zero after local model deployment.
From the very beginning, they determined: Don't do edge - side small models because the market for small models is not large enough and not general enough; Don't do post - training because once the cloud - based model is updated, the knowledge information will be directly overwritten.
Based on this idea, Wange Zhiyuan has launched the edge - side AI inference engine cPilot.
cPilot is an engine oriented towards the underlying ecosystem. It is an intermediate layer between the underlying hardware and the upper - layer software. Through self - developed algorithms, it maximally compresses the memory usage of model operation and stimulates the capabilities of the underlying hardware.
Under normal circumstances, a hardware device with 32GB of memory can only allocate 8 to 10GB of space for model inference, and only a model with a parameter size of about 4B can be deployed locally.
With the same hardware configuration, based on the cPilot computing power engine, the model parameters that can be deployed on the edge side can be increased from 4B to 80B. Taking a hardware manufacturer customer as an example, after using the cPilot solution, the hardware cost of each machine can be reduced by about 2000 yuan. At the same time, the model parameters that can be deployed can be increased several times.
However, local model deployment is not a universal solution, and the capabilities of the edge side are always limited. At the same time, users' needs are also changing. As the capabilities of models become stronger, users are gradually no longer blindly pursuing model capabilities but calling appropriate models according to their needs.
Based on this, recently, Wange Zhiyuan also launched the edge - side intelligent platform Amis, which can connect to mainstream agent tools and models and allow users to use cloud - based computing power.
Amis acts as an API aggregation platform and a scheduling center. Users can directly use agent tools such as OpenClaw and Hemers on Amis, flexibly connect to and switch between different models. The platform can also automatically allocate cloud - based and local computing power and switch according to factors such as the complexity of tasks.
The advantage is that most of users' needs are lightweight, high - frequency, and token - consuming tasks, which can be completed locally. Only a small number of complex tasks that are difficult to solve on the edge side need to be sent to the cloud.
Users don't need to pay other model manufacturers. They can directly configure models on Amis. Through the scheduling between the edge and the cloud, most simple tasks can be completed locally, achieving zero token consumption. Only 10% - 20% of the tasks need to be sent to the cloud, greatly reducing the cost.
Wang Guanbo said, "We hope to better penetrate into the general C - end application scenarios. The ultimate goal of Amis is to let users develop the ecological habit of using the platform."
MoE is Sparse Enough, but There is Still Room for a Ten - Fold Reduction
Wang Guanbo believes that if it is a market that everyone can clearly see, it must not be an opportunity for startups.
At the beginning of the startup, when the influence of MoE (Mixture of Experts) was not that great, Wange Zhiyuan chose to optimize the Dense (dense model) architecture on the edge side first.
At that time, many people thought that the capabilities of open - source models were relatively limited. Was it too early for Wange Zhiyuan to do edge - side intelligence at this stage?
In response, Wang Guanbo chose to boldly bet on the uncertainty of users' needs and industry trends.
This includes three things: first, in terms of model capabilities, will users only need models that can solve their needs instead of completely pursuing quality; second, hardware cost, which is also the core barrier they decided to overcome; third, whether the token usage will experience explosive growth.
Focusing on these three anchor points, Wange Zhiyuan first started from optimizing hardware capabilities and reducing model operation memory, and carried out full - stack optimization on the underlying hardware, the intermediate layer, and algorithm software respectively.
From the software and algorithm level, whether it is Dense or MoE, only local parameters are activated during inference. Even for a model like MoE that already uses a sparse structure, there is still room for a reduction in sparsity of about ten times.
Therefore, Wange Zhiyuan designed a set of "dynamic sparse activation algorithms" that can accurately predict which part of the parameters the model should calculate and load during the inference process, thereby significantly reducing the actual number of parameters.
From the edge - side hardware level, the three bandwidths of memory, CPU memory access, and CPU - GPU interaction affect the overall performance of the computer. Facing these three bandwidth limitations, Wange Zhiyuan established a scheduling system similar to CUDA (Compute Unified Device Architecture) to transform the hardware layer into an edge - side large - model inference platform and an edge - side large - model memory management system. They also made adaptations for chips from different manufacturers.
According to Wang Guanbo, during the test, they ran a large model with 35B parameters on a machine equipped with an AMD chip, and the memory usage was 27.6GB. At the same time, under the same hardware conditions using the cPilot engine, the memory usage for running this model could be compressed to 4.7GB.
This also means that with a memory usage of less than 5GB, users can use large models such as Qwen3.6 and Gemma 4, which have the capabilities of coding and complex task processing.
The Second Half of AI is on the Edge Side
"In the past, the edge side was not really favored by everyone," Wang Guanbo told "Intelligent Emergence". "However, many investors told us that this year, there has gradually formed a consensus in the entire investment track, that is, the edge side may be the future."
Compared with the explosive growth of agent capabilities and token demand, the actions of manufacturers to lower token prices are almost a drop in the bucket.
Wange Zhiyuan hopes that the edge side can become the next computing paradigm, allowing users to change from 'renting intelligence' to 'owning intelligence'.
In the long run, they believe that the future use of tokens will be similar to today's WiFi. All hardware will have the ability to produce tokens locally, moving all the capabilities from the cloud to the edge side. Each device on the edge side can provide targeted services to all surrounding networks.
Currently, the services provided by Wange Zhiyuan still focus on being the intermediate layer between software and hardware. However, Wang Guanbo said that this is their first stage.
In the next stage, they may consider self - developing edge - side AI hardware. "It's not the right time to focus on hardware yet," Wang Guanbo said.
On the one hand, the technology on the chip side has not converged yet. Currently, GPUs are suitable for model training but not for efficient inference. Entering the hardware field now will fix the form and lead to relatively high iteration costs in the later stage. The next - generation chips, such as domestic NPUs, may bring about a major change on the chip side.
On the other hand, making hardware does not solely rely on technology and engineering capabilities. More importantly, it requires supply - chain capabilities. "If we want to make hardware, we need to plan about 10 months in advance to connect the upstream and downstream supply chains and market sales," Wang Guanbo said. "Cooperating with B to C customers can also help us seize the ecological niche first."
"The AI wave will gradually recede next year. This'recession' does not mean exiting the stage but hitting the edge side."
In the next stage on the edge side, there will be an application that can bear the explosion of tokens, and what they need to do is to provide more downstream services for these applications. In the long run, they hope to make cPilot and Amis the most complete platforms in the low - memory track, which are cross - platform applicable and ready - to - use for users.
Welcome to communicate!
The AI official account under 36Kr sincerely recommends you to follow.