Stop fixating on GPUs. Intel has unleashed a major move. Can it end NVIDIA's monopoly on computing power?
In the past two years, there has been almost only one core in AI hardware: the GPU.
From large - model training, to inference clusters, and then to edge - cloud computing power, the entire industry has been discussing who can obtain more GPUs and who can squeeze more computing power cards into data centers. It can be said that the entire AI industry revolves around GPUs, which has also driven NVIDIA's stock price to reach new highs.
However, at COMPUTEX 2026, Intel presented a different view: In the next stage of AI, we can't just focus on GPUs. The core of this view is the keyword repeatedly emphasized by Chen Liwu in the keynote speech: Agentic AI, which is what we commonly call intelligent agents.
Image source: Intel
Intelligent agents are changing the computing ecosystem
The difference between intelligent agents and traditional AI is actually quite significant. Traditional AI is like a "turn - based" question - answering machine, while intelligent agents are designed to enter real - world workflows and actively complete the cycle of "thinking, planning, acting, and reflecting". In other words, they need to learn to read data, call tools, execute tasks, check results, and then continuously adjust the next step based on feedback.
This means that AI inference is no longer just a "one - off deal", but has become a continuously running self - decision - making and self - reasoning system, which has completely changed the way of using computing power. Therefore, Intel's most core view this time is: Agentic AI will reshape the computing power ratio in data centers.
Currently, in the cutting - edge model training stage, the ratio of CPU to GPU can be close to 1:8, with GPUs bearing the vast majority of the computing pressure. However, after entering the intelligent agent inference mode, the CPU needs to be responsible for task orchestration, tool invocation, data migration, and system coordination. At this time, the ratio of CPU to GPU will gradually approach 1:1, and even a higher CPU density is required to quickly disassemble tasks.
In fact, when an intelligent agent not only generates an answer but also needs to continuously call models, tools, and external systems, its working state is completely different from that of traditional AI. Intel mentioned a piece of data in the speech: Compared with single - round inference, the token consumption of an intelligent agent can increase by up to 1000 times.
Image source: Intel
In other words, what intelligent agents bring is not simply an increase in the amount of inference, but a more complex, high - frequency, and fragmented system load. If all these loads are left to the GPU to handle, it will be both inefficient and expensive.
The Xeon 6+ processor released by Intel this time is based on Intel's 18A process, with a maximum of 288 energy - efficient cores and a maximum of 576MB of level 3 cache. It can provide higher energy efficiency and more stable continuous performance for cloud - native, Agentic AI, and network - intensive load requirements.
In Intel's solution, a single liquid - cooled rack occupying 32U of computing space can provide 36,864 cores; the rack power consumption is only about 100kW, which is sufficient to support high - density intelligent agent deployment. Although 100kW seems intimidating, compared with server racks with the same performance in the past, the power consumption has been significantly reduced.
Beyond the Xeon 6+, there is something even more worthy of attention: Intel's re - disassembly of the inference architecture.
In the speech, Intel announced a partnership with partners such as SambaNova, Vista Equity Partners, and Cambium Capital to officially launch a new fully decoupled inference solution. This solution runs on the Vector Core Compute intelligent agent cloud, with the Intel Xeon 6 processor responsible for orchestration and execution, the SambaNova SN40 RDU responsible for decoding, and the NVIDIA Blackwell GPU responsible for pre - filling.
Image source: Intel
This new solution is specifically designed for intelligent agent loads. Different from many past AI systems that were used to delegating most of the work in the inference link to the GPU, in this system, the CPU, RDU, and GPU will each perform their own duties, responsible for different links such as system scheduling, decoding, and pre - filling respectively. This allows each inference stage to run on the most suitable hardware, maximizing efficiency.
After introducing the Xeon 6+, the 3rd - generation Core Ultra processor, which was released some time ago, made an appearance again. It is another part of Intel's AI ecosystem - the core of edge - side AI. In the speech, the hybrid local server demonstrated by Intel and Perplexity was built based on the 3rd - generation Core Ultra and the Xeon 6+ cloud server.
Image source: Intel
It can dynamically allocate workloads between the local and the cloud according to the device's capabilities and functional characteristics, further reducing the dependence on cloud computing power. This is also the ideal form of future AIPC: By dynamically allocating performance, while reducing token costs, it ensures the immediacy of tasks and the protection of data privacy.
In addition to PCs, Intel has also extended the 3rd - generation Core Ultra to the fields of game handhelds and edge computing. The newly released Arc G3 series of processors are designed for handheld game devices, optimized based on the same - generation architecture, and will be available later this month (the integrated graphics card most anticipated by handheld game users is coming).
From general - purpose to customized, Intel also wants to be "everywhere"
In addition to general - purpose processors, Intel also emphasized customized chips this time, which is also a business that Chen Liwu has been promoting since he became Intel's CEO.
Intel believes that customized chips will have a huge market in the future. As AI enters different industries, customers will become increasingly dissatisfied with general - purpose computing power. In order to pursue higher efficiency and performance, they will gradually tend to use customized chips to maintain their competitiveness.
In the speech, Intel mentioned that it is collaborating with Google to launch IPUs, which are very important for cloud service providers to improve infrastructure performance. At the same time, Intel is also collaborating with telecommunications customers such as Ericsson to provide advanced wireless infrastructure chips globally.
This is actually another theme of Chen Liwu's speech: Intel no longer relies on a single general - purpose chip to win the market. Instead, it packages chips, systems, software, and industry cooperation into a complete set of solutions, which can be freely customized according to the needs of different enterprises, thereby maximizing Intel's advantages.
Image source: Intel
In the view of Lei Technology, Intel is actually redefining its ecological position: Data centers need CPUs to be responsible for intelligent agent orchestration, inference systems need heterogeneous decoupling to reduce costs, PCs need local AI to handle privacy and compliance issues, edge and embodied intelligence need high - energy - efficiency chips, and industry customers need customized chips.
By meeting the needs of enterprises in different fields and different links, Intel will become even more "everywhere" than NVIDIA.
Of course, Intel still faces great pressure. NVIDIA's advantages in AI accelerators and software ecosystems are still obvious, and AMD is also continuously attacking in the fields of server CPUs and AI chips. Whether Intel can succeed in this path ultimately depends on the mass - production speed of the 18A process, whether the rack - level solution of the Xeon 6+ can be quickly implemented, and whether customers can truly see significant benefits from this new solution.
But at least this time, Intel's direction is clearer than before.
It can be said that as AI enters the era of intelligent agents, the competition is no longer just a comparison of the peak performance of a single chip, but involves the optimization of the collaborative efficiency of the entire computing system. GPUs are still important, but CPUs, edge devices, local AI, and customized chips will also become crucial again.
What Intel wants to seize is precisely this window period of the re - division of labor in AI infrastructure.