Exclusive Interview with Apple's Chip Executive Doug Brooks: Why Isn't Apple Afraid as the Entire Industry "Surrounds" Unified Memory?

One China Principle with respective interpretations

The PC and chip industries are all vying for a share of Apple's "unified memory architecture" market.

In 2024, Lunar Lake achieved the integration of memory into the chip package; in 2025, Strix Halo pushed the unified memory bandwidth to 256GB/s; in the first half of 2026, the Snapdragon X2 Elite series, which hit the market, adopted a similar shared memory architecture; this week, the Windows on ARM industry once again united under the banner of RTX Spark to launch a challenge.

A large and unified memory is used to drive edge-side AI: to put it simply, the entire industry is copying Apple's homework like crazy.

But the current question is, will these competitors, who are at least five or six years behind Apple, succeed in their joint siege? Where does Apple's confidence in the unified memory architecture come from?

On the eve of WWDC, ifanr had an exclusive interview with Doug Brooks, the senior product manager of Apple silicon.

Doug Brooks, Senior Product Manager, Apple silicon

Memory bandwidth is more important than unified memory

When talking about the design logic of Apple silicon, Brooks told me: Apple is a chip manufacturer that doesn't have to consider external customers.

"We're not the kind of chip manufacturer that makes a bunch of chips and hopes others will put them into various systems or utilize different functions of the chips. Our chips are designed solely for our own systems, and the system design and chip design influence each other exclusively."

In essence, Apple's chip designers know exactly what kind of workloads and workflows they need to optimize for in the chips. Conversely, they don't have to worry at all that the features built into the chips won't be utilized by devices like the iPhone, iPad, and Mac.

Having just an architectural lead is not enough. Brooks repeatedly emphasized "balance".

Thanks to the unified memory architecture that has been adopted since the first - generation A - series chips, core computing units such as the CPU, GPU (including the on - core neural network accelerator), and neural network engine (ANE/NPU) are all located on the same chip and are uniformly connected to the memory inside the package or outside the chip.

But more crucial than the single - chip design is that Apple's unified memory architecture ensures that the CPU, GPU, and neural network engine can share and access this large memory pool, as shown in the following figure.

This is something we haven't seen in other similar unified memory architecture products so far.

Starting from the M5 Pro/Max, Apple also moved towards a two - chip package fusion architecture. In this architecture, the inter - chip interconnect bit - width within a single SoC is not fixed and will increase with different SKUs. Brooks told ifanr:

"A system with a lot of computing power but insufficient memory bandwidth? Apple won't build such a system. From the M5 to the M5 Pro and then to the M5 Max, the number of GPU cores doubles and quadruples - the M5 chip is equipped with a 10 - core GPU, which expands to a 20 - core GPU in the M5 Pro, and the top - of - the - line M5 Max is equipped with a 40 - core GPU.

But you shouldn't just focus on the increase in the number of cores. We've also doubled the memory bit - width. As the product line is upgraded, we also double the unified memory bandwidth of each chip. Only in this way can we ensure that all kinds of workflows required by users can be met by the computing power of the entire chip."

Digital comparison of "unified memory architecture"

Apple didn't directly respond to the comparison between Apple silicon and other market solutions. However, ifanr compared the public specifications of several products that adopt a similar unified memory architecture: Strix Halo has a bandwidth of 256GB/s, GB10/RTX Spark has a bandwidth of 273GB/s, the Snapdragon X2 Elite Extreme has a bandwidth of 228GB/s, and Apple's M5 Max can reach a maximum of 614GB/s.

In other words, the memory bandwidth of all other known industry solutions has just reached the mid - to - high - end level of Apple, and is still more than twice behind the top - end level. And it has taken them more than two years to reach this point.

The die shot of RTX Spark shows that this "explosive" and disruptive SoC has obvious bottlenecks: it is composed of two chips spliced together, with the Blackwell GPU on one side and other components such as the MediaTek CPU on the other side, connected by an NVLink bridge in the middle.

The DRAM and memory controller are located on the CPU side. There is no memory controller on the GPU side, so the GPU needs to access the memory via the NVLink through the memory controller on the CPU side.

That is to say, although the bidirectional bandwidth of this NVLink C2C can reach about 600GB/s, the real memory bandwidth of this SoC will not exceed the level of GB10, that is, it is capped at around 273GB/s, rounded up to 300GB/s.

What's more worth mentioning is that RTX Spark is not a new design in 2026, and it can't even be considered a 2025 design. The "2443" engraving on the SoC in the photos from the Computex site means that it was packaged in the 43rd week of 2024.

The die shot shows that its CPU uses MediaTek's public - version X925 and A725 cores from 2024, which are at least one or even two generations behind in 2026.

Repackaging a two - year - old processor and releasing it as a new chip - this itself shows how popular the trend of unified memory is.

There is room for improvement even for the leader

Apple's Neural Engine (ANE) made its debut with the A11 chip in 2017. In previous articles, we have demonstrated that although the ANE was only used for neural network computing scenarios largely unrelated to AI at that time, it laid a crucial foundation for Apple to embrace today's AI boom, especially edge - side AI workflows.

It's good, but the ANE has not been open for a long time. Specifically, although the Core ML framework can call the ANE, Apple has not provided enough tools and capabilities for developers to decide when and how to call the ANE to handle non - inference workloads. It's like having a gold mine of computing power here, but the door is sealed.

So at the beginning of this year, community developer Manjeet Singh carried out a reverse - engineering of the ANE on the M4 processor by himself, and he actually succeeded. He found that the M4 ANE has extremely high power - efficiency, capable of providing 6.6TOPS of computing power per watt when running at full capacity.

Moreover, he later managed to train a complete 100 - million - parameter transformer model on an M4 Mac mini by only using the computing power of the ANE and completely bypassing the Core ML restrictions. It took a total of 50,000 steps, 96 milliseconds per step, and the overall power consumption was less than 1W (the weights and the Adam optimizer still required CPU support, and the combined power consumption of the ANE and CPU was less than 8W).

It turns out that the ANE is just a matrix multiplication calculator (INT8/FP16), and Apple's official "inference - only" setting for it is just a setting. After all, the reverse matrix multiplication used in training is also matrix multiplication. Apple just didn't provide a public training interface for the ANE, so Singh created one himself.

Obviously, people are very interested in this unexploited gold mine that is the ANE.

Its allure lies not only in the power - performance of the ANE itself, but also in the fact that there are more than 1 billion active devices equipped with it. This enables iPhones, iPads, and Macs to not only drive a series of Apple Intelligence functions such as the rumored upcoming AI Siri without affecting battery life and heat dissipation, but also have the potential to handle high - performance, low - power local AI workloads that today's developers haven't even imagined yet.

Or maybe they've thought of it, but Apple just didn't open up the ANE for them to use before.

During this exclusive interview, we asked Brooks face - to - face how developers should choose computing units and what Apple thinks of the community's reverse - engineering of the ANE.

He said that Apple provides a series of APIs at different levels. For high - level APIs like Core ML, developers can simply say "run this model for me", and let the system decide whether to use the neural network engine or the GPU (MLComputeUnits.all); alternatively, developers can also specify "I want it to run on the CPU, GPU, or ANE".

He specifically emphasized: "We want to give developers as much control as possible."

This actually refers to the fact that at WWDC 2025, Apple first introduced tensors as a native resource type in Metal 4, allowing users to more precisely control the calculations in the shader or the newly added neural accelerator within the GPU core.

Brooks didn't directly respond to the event of reverse - engineering the ANE, but he still gave a high evaluation of the community:

"Taking a step back and looking at the big picture, the Mac has always been an innovative AI platform. We're very glad to see such an active community doing all kinds of exciting work at various levels and a large amount of open - source research and contributions."

The door to the gold mine won't be closed forever, but as the owner of the mine, Apple has always attached great importance to system security and still needs to make a careful decision about who to give the key to.

In addition, ifanr also noticed that Apple's current progress in edge - side models is a bit disappointing.

Apple's Foundation Models framework integrates Apple's self - trained edge - side large models directly into the iOS and macOS systems. Developers can call these models with just a few lines of code. There is no cloud API billing, no need to purchase tokens or subscribe to a paid service, and the models can be used offline directly. The data also stays local and is encrypted throughout. So far, no other company can offer such an architecture. Brooks told ifanr:

"It's not only free, but also doesn't require an internet connection. It can run locally anytime and anywhere, which is very powerful in itself. What makes me even more excited is that the Foundation Models API has been adopted by thousands of applications to implement various AI functions, not just simple text processing, but also extremely powerful productivity tools."

But today, especially for those professional users who are the most productive and most eager to transform their workflows, the way they use AI has long gone beyond the simple dialogue interface. They have entered a new workflow era where they can launch a task at any time and mobilize dozens or even hundreds of agents to split, proxy, cross - verify, and summarize.

At this time, is this local "small" model smart enough?

Fortunately, the answer is not a binary choice. Apple's current strategy is to use Private Cloud Compute technology in Apple Intelligence to call more powerful cloud - based models under the logic of security and data disposal after use.

Currently, the ceiling of the edge - side model, or more specifically, Apple's own edge - side model, is clearly visible. With about 3 billion parameters, in Apple's own technical report, its competitors on the stage are early models with relatively small numbers of parameters such as Qwen - 2.5 - 3B and Gemma 3 - 4B. Models of this scale are good at light - weight generation tasks such as summarization, rewriting, and image editing, and have great potential in application scenarios.

But once faced with complex reasoning, code - related tasks, or tasks that require world knowledge, it pales in comparison with today's flagship models trained specifically for agent tasks by companies like OpenAI, Anthropic, Kimi, and MiniMax. According to the latest public information, Apple's server - side model is still "behind GPT - 4o and Llama - 4 Scout" and is far from the first - tier models.

Ultimately, Apple's moat lies in its hardware, integration, the realization of a true unified memory architecture, and the unattainable memory bandwidth in the consumer - grade computer market. However, the capabilities of the model itself have become the most worrying part of Apple's system.

But Apple may have a trump card up its sleeve.

Will the outcome be revealed at WWDC?

The annual Apple Worldwide Developers Conference is about to be held at midnight on June 9th, Beijing time.

If the revelation from Apple expert Mark Gurman of Bloomberg is true, Apple is very likely to replace the Core ML, which has been used for many years, with a brand - new Core AI framework. This rumored new framework will allow developers, for the first time, to directly access models from selected (but in principle, approved by Apple) providers in a way they are very familiar with, such as through APIs.

In addition, it is rumored that the new - generation edge - side base model that Apple may soon adopt could be a new model distilled from other leading US AI companies. Apple may even choose Google, OpenAI, Anthropic, etc. as the default model

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Exclusive Interview with Apple's Chip Executive Doug Brooks: The Entire Industry Is "Surrounding" Unified Memory — Why Isn't Apple Afraid?

Memory bandwidth is more important than unified memory

Digital comparison of "unified memory architecture"

There is room for improvement even for the leader

Will the outcome be revealed at WWDC?