In the inaugural year of global Agent PC, what enables domestic AI PCs to "take the lead"?
At the just - concluded GTC Taipei conference, Jensen Huang said that in the past 40 years, the way people used PCs was to open applications, click, and input. Now, Microsoft and NVIDIA are going to reinvent the PC.
He demonstrated a computer that can run a personal Agent 24/7, enabling the public to fully realize that AI is transitioning from the era of large language models to the era of Agentic AI.
The role of the PC is also changing: from a tool that passively waits for user operations, it has become a personal computing hub capable of understanding contexts, reasoning and planning, and invoking tools. This change is regarded by Jensen Huang as the most important underlying reconstruction of the PC since Windows 95.
Almost at the same time, the domestic AI PC, Great Wall N90 Pro, was officially launched. This AI PC has a similar positioning to the Agent Computer demonstrated by Jensen Huang. It also takes the Agent as the design origin and has achieved smooth local operation of large models on the edge side within a thin and light body.
With the simultaneous advancement of two technological routes, the same conclusion is reached: edge - side computing power is the ticket to enter the Agent era.
So, in terms of specific solutions, what are the differences in the domestic solutions in the three dimensions of computing power supply, economic feasibility, and security boundaries?
01. Reinventing the PC: What does an Agent - native PC need?
Jensen Huang breaks down the Agent Computer into three necessary conditions.
The first is sufficient local computing power because the Agent needs to handle multiple model invocations and inferences simultaneously, with the parameter scale reaching tens of billions. The second is a security sandbox to ensure that the Agent runs in a protected environment and cannot access the entire system's resources at will. The third is the Agent runtime, which is the middleware software capable of understanding user intentions, disassembling tasks, and invoking tools.
These three conditions are necessary because the working mode of the Agent is completely different from that of traditional software. The execution path of traditional software is linear: when a user clicks a button, the software executes a function and then ends.
The operation of the Agent is cyclical: it receives a vague instruction, disassembles it into multiple steps on its own, invokes different tools, and adjusts the next action based on intermediate results until the task is completed. In this process, each inference requires computing power support, each tool invocation requires permission management, and the transition of each step needs to be scheduled during operation.
Among the three conditions, the industry first considers breaking through in computing power.
In 2024, when Microsoft proposed the Copilot + PC standard, it only required 40 TOPS. At that time, the industry generally thought it was sufficient, but two years have passed, and this judgment has been overturned. From OpenClaw's desktop automation to intelligent meeting assistants, large AI models have changed from chat tools to actual productivity tools. A single task requires multiple inferences, and small parameters are simply not enough. The industry now generally believes that models with 35B or more parameters are just the entry - level.
The growth rate of computing power demand far exceeds the iteration speed of chips: it takes about two years to update a generation of chips, while current AI applications and multimodal large models undergo significant changes every few months.
The impact of this rhythm difference has been reflected in the industrial chain. Top enterprises in the industry believe that currently, about 70% - 80% of AI computing power is used for training, and 20% - 30% is used for inference. However, in the future, this ratio will be reversed. Data from TrendForce also shows that the AI training computing power of the top five North American cloud service providers is expected to increase by 56% in 2026, while the inference computing power will soar by 122%.
Once the computing power is improved, power consumption becomes a new problem.
In traditional solutions, when the computing power increases from dozens of TOPS to hundreds of TOPS, the power consumption and size of the chip increase linearly, making it impossible to fit into a thin and light notebook.
Great Wall N90 Pro AI PC
The answer given by the Great Wall N90 Pro is: start from the requirements. First, figure out what the notebook needs, and then select the chip.
Many AI chips were originally designed for data centers, with power consumption of hundreds of watts and large volumes. After being moved to terminal devices, heat dissipation, battery life, and noise become problems. The M50 chip used in the Great Wall N90 Pro is not a server - derived solution.
The M50 chip is from Houmo Intelligence. The key underlying technology of this solution is "computing - in - memory". In traditional chips, computing and storage are separate, and data needs to be constantly moved between them, which consumes a large amount of energy. Computing - in - memory deeply integrates computing and storage, eliminating the need for long - distance data transfer and significantly reducing power consumption.
On the premise of meeting the local operation of a 35B model, the power consumption of the M50 chip is controlled at around 10W, and the power consumption of the entire board is less than 15W. That is to say, it can be directly plugged into the M.2 interface to work, just like installing an ordinary solid - state drive.
It can be seen that in the era of Agent Computer, the domestic solution to the edge - side computing power problem shows an obvious "demand - oriented" approach. Instead of forcing server chips into notebooks from a technical perspective, it designs a chip specifically for notebooks based on the real scenarios of terminals. Engineering problems such as power consumption control, heat dissipation design, and battery life balance have been considered from the design stage.
Great Wall chose to cooperate with Houmo Intelligence and conduct in - depth collaborative optimization, also because it values their ability to mass - produce the concept of computing - in - memory.
A chip with a power consumption of 10W enables a thin and light notebook weighing just over 1 kg to run a large model with 35B parameters smoothly locally. In the past, it required a GPU with a power consumption of over 500W and a full - size tower workstation, but now an ordinary notebook is sufficient.
Once the computing power and power consumption are "sufficient", the next necessary condition will focus on security issues. The nature of the Agent's work determines that it cannot do without data, and local computing power has a natural advantage: data does not leave the terminal.
Agent tasks often involve sensitive information such as meeting minutes, personal knowledge bases, and office documents. Once cloud - based processing is involved, compliance risks will be magnified. Running on the edge side, data is closed - looped locally from input to output, achieving data security and compliance at the physical level, which is a prerequisite for the Agent Computer to have rich implementation scenarios.
Jensen Huang also repeatedly emphasizes the importance of security. The global AI industry has realized that security is a must - have for the popularization of Agents.
In 2026, the popularization speed of AI PCs can already be measured by sufficient market data. Gartner predicts that the global AI PC shipments will reach 143 million units in 2026, accounting for 55% of the entire PC market, which also means that AI PCs may soon surpass traditional PCs and become the mainstream for purchase.
The pace in the Chinese market is even faster, and it has become the core engine driving the market. IDC predicts that although the overall PC shipments in China are expected to decline by 0.8% in 2026, the AI PC shipments will surge by 146.5% year - on - year, with a compound annual growth rate of 58.7% in the next five years. By 2029, it is expected to account for 36.5% of the overall PC market.
The operating system level is also keeping up to support local computing power. The continuous updates of Microsoft Windows 11 have added a large number of AI functions, and domestic operating system manufacturers such as Kylin have also begun to integrate local Agent capabilities.
From chips, complete machines, operating systems to Agent applications, the entire industrial chain is preparing for Agent - native PCs.
02. Calculating the "Token Account": How important is edge - side computing power?
The discussion of computing power is about whether it can run, while the Token cost determines where it is most cost - effective to run.
In 2026, when Agents are widely implemented, this issue has also begun to reshape the business logic of the entire AI computing. Jensen Huang proposed Token economics at the GTC 2026 in March. He divided Token services into five levels:
The free level is used to attract users; the basic level costs about $3 per million Tokens and serves ordinary users; the advanced level costs about $6 per million Tokens and provides larger models and faster speeds; the high - speed level costs about $45 per million Tokens and supports long - context and in - depth inferences; the top - level costs about $150 per million Tokens and is for ultra - long research tasks and real - time responses on critical paths.
He calculated an account: a researcher uses 50 million Tokens per day. Calculated at $150 per million, it is acceptable for a research team.
Tokens are not a one - time purchase. As long as AI is running, Tokens are being consumed. When Agent applications are fully rolled out, the monthly Token bill for an enterprise - level AI application can easily reach hundreds of thousands of dollars.
In March 2026, Alibaba established the Token Hub business group, with CEO Wu Yongming personally in charge. This shows that Token management has indeed changed from a technical issue to a business strategy issue. Currently, many domestic cloud service providers have either adjusted or are in the process of adjusting API call prices, and the pricing of one million Tokens for some models has increased multiple times in the short term.
It is foreseeable that in addition to being a billing unit, Tokens can also be directly exchanged for scarce business resources.
The business logic of edge - side computing power becomes clear here: paying a one - time fee to buy AI PC hardware, and no Token fees will be incurred for each subsequent basic inference. This promise is definitely attractive.
Agents will multiply the consumption of Tokens, and the zero - marginal - cost advantage of the edge side has also turned from theory into reality. A frequently cited comparison is that the hardware cost of a high - end AI PC is about 10,000 to 20,000 RMB, while if a team frequently invokes cloud APIs every day, the Token fees for a few months may exceed this amount.
Some people in the industry summarize the boundaries between local and cloud inferences into three lines.
The first is the model size. Models with 120B or fewer parameters can already run locally. The second is security and confidentiality. Scenarios involving privacy and sensitive data must be processed locally. The third is commercialization. For Agent scenarios with high - frequency Token usage, local inference can completely avoid cloud - based pay - as - you - go billing.
Based on these three lines, a judgment is being formed: in the future, 80% of inference scenarios will move to the local side.
This judgment is supported by more and more evidence. Omdia data shows that with a distributed architecture that dynamically schedules workloads between the edge, the cloud, and the end, by placing 80% of lightweight tasks locally, based on 50 AI requests per person per day and a typical single - request cost of $0.003, the annual cloud cost for 100 million users can be reduced from $5.5 billion to $1.2 billion, saving more than $4.3 billion.
For enterprises and Agent application developers, this is a figure that cannot be ignored. For individual users, edge - side computing power further lowers the threshold for using AI. When using Agent capabilities to complete some mature inference tasks and stable processes in daily life, there is no need to purchase expensive cloud computing power quotas, and there is no need to worry about receiving a huge bill at the end of the month. Once a device is purchased, the AI capabilities are already available locally.
Based on the logic of Token economics, the popularity of edge - side computing power has begun to be widely verified.
For example, NVIDIA released the PC super - chip RTX Spark for Windows. Whole - machine manufacturers such as Dell, Lenovo, HP, ASUS, and Acer are all on the list of the first - batch products. A common selling point of these products is that they can run AI locally without consuming cloud Token quotas.
Domestic manufacturers are also acting quickly. In this round of implementation of edge - side computing power, the launch of the Great Wall N90 Pro is a real market action. Supported by the mass - produced M50 computing - in - memory chip, the 35B model runs smoothly locally. This also means that for high - frequency Agent instructions issued by users, Token consumption occurs entirely locally, without incurring any cloud - based call fees.
Houmo Manjie M50 chip, Liqing LQ50 M.2 card
That is to say, with the support of the operating system and AI applications, once a Great Wall N90 Pro is purchased, the cost of subsequent daily inferences is almost zero.
Thus, edge - side computing power has completed a re - evaluation of its value in the Agent era. It was often regarded as a cheap alternative to cloud computing in the past, but now it has become an indispensable layer of infrastructure in the computing power structure with rising Token consumption.
Jensen Huang compares Tokens to the oil in the digital world. Then, the role of edge - side computing power is like those distributed energy nodes with their own oil fields: they do not rely on oil pipelines but can independently meet all the needs of local users.
When the oil price keeps rising, the value of having a self - owned oil field will become prominent.
03. How high can domestic whole - machine manufacturers go?
Currently, although the global AI PC technology is still dominated by overseas giants such as NVIDIA and Microsoft, the domestic whole - machine solutions, which started almost at the same time, are quietly transforming from following to running in parallel.
The mass production and delivery of the Great Wall N90 Pro not only means the launch of an Agent Computer - shaped product but also represents a complete verification of the entire domestic technology stack.
For a long time, PC users in the Chinese market have been more receptive to AI applications, have more needs for daily work and efficiency improvement, and are more sensitive to data privacy and inference latency. These are the logical supports for the advantages of edge - side computing power to be magnified in the domestic market.
The low latency, zero risk, and personalized experience brought by local AI have stimulated the demand for PC replacement, which was originally severely impacted by mobile office. Therefore, in this wave of AI PC upgrades, the competition logic of domestic whole - machine manufacturers has also changed.
For a long time in the past, the core narrative of domestic PCs revolved around security and controllability or high - cost - performance alternatives. The emergence of AI PCs has changed the way product forms are defined. Now, security is no longer a separate selling point but is packaged and sold as a natural attribute of edge - side computing power.
Behind this transformation, domestic edge - side chips are also proving themselves with practical and implementable products. Take products that are also equipped with the M50 chip like the Great Wall N90 Pro as an example: an AI mini - workstation with a 1L body achieves a computing power density of 640 TOPS/L through four M50 chips and can directly run mainstream local large models such as Qwen3.6 out of the box; the ultra - mini AI host P7 weighs only 300g, with a maximum power consumption of 30W for the whole machine, but can support the local deployment of models with hundreds of billions of parameters.
These figures are at the first - tier level in the global market.
Some users have also said after testing the Great Wall N90 Pro that it is "the fastest - running AI PC they have ever seen, even faster than many models running on large desktop GPUs."
There is no need to use the rhetoric of domestic substitution to prove itself. The product itself is the best answer.
Domestic whole - machine manufacturers also have their own methodology in technology selection. Taking Great Wall as an example, when selecting edge - side AI chips, it values not just the computing