Citrini: Both AMD and Apple are betting on flash memory to replace DRAM, which could lead to a 55-fold plummet in memory costs
The dependence of AI inference on expensive DRAM is loosening.
AMD announced the acquisition of memory optimization company MEXT, introducing AI-driven flash memory optimization technology to data centers. This marks a structural shift in the AI storage architecture, and the core driving force boils down to one number: the cost of flash memory is only 1/55 of that of DRAM.
On Monday, AMD announced the completion of the acquisition of MEXT for an undisclosed amount. The AI-driven predictive memory technology developed by MEXT aims to make flash memory behave more like DRAM, expanding the available memory capacity while maintaining performance and efficiency.
AMD said that this acquisition will expand its AI product portfolio, helping data center customers improve performance, reduce total cost of ownership, and accelerate workload deployment. "The demand for memory is growing in every category of enterprise computing," AMD said in a statement.
The acquisition news drove AMD's stock price up 7.7% to $550.75 on Monday, with its market value approaching the $900 billion mark. The S&P 500 index rose 1.8% on the same day, and AMD's cumulative increase this year has reached 323%. Citigroup upgraded AMD's rating from neutral to buy last Friday and raised its target price from $460 to $575.
Notably, Apple began promoting the "LLM in a Flash" edge-side solution as early as 2024. Behind this layout is the intensifying DRAM supply crisis. According to TrendForce data, high-bandwidth memory (HBM) already accounts for about a quarter of the total DRAM wafer production capacity, and the DRAM contract price soared by about 90% quarter-on-quarter in the first quarter of 2026.
Citrini Research pointed out that the AI storage demand has become so large that it requires a multi-layer architecture to handle. Flash memory does not replace HBM but undertakes the overflow demand in terms of capacity - this architectural reconstruction is re-pricing the entire AI storage industry chain.
The Memory Tax Crisis: The Bottleneck Spreads from AI to the Entire Economy
According to the judgment of Shawn Kim's team at Morgan Stanley earlier this month, the soaring memory prices and scarce supply are evolving into a comprehensive risk for the digital economy, "spreading from the bottleneck of AI infrastructure to hardware profit margins, device affordability, cloud costs, inflation, and even policy levels."
This pressure is evidenced by a specific case: Xbox CEO Asha Sharma said last week that the memory cost has increased by about five times in the past two years, preventing the company from producing the number of game consoles consumers need.
The continuous encroachment of HBM on DRAM production capacity is the core driver of this crisis. According to the data disclosed by TrendForce and manufacturers such as Samsung, SK Hynix, and Micron, the proportion of HBM in DRAM wafer production capacity has climbed from 2% in 2020 to an estimated 25% in 2026. Ultra-large cloud providers pre-order future wafer outputs through multi-year contracts, further squeezing the available production capacity of standard chips for mobile phones and PCs.
The construction of new DRAM production capacity also faces structural constraints. Capacity expansion relies on EUV lithography machines to print finer line widths. Each EUV device costs about $200 million, and the investment in a new wafer factory can easily reach tens of billions of dollars. Even under favorable circumstances, the construction cycle takes several years. This supply rigidity is the fundamental reason for the persistence of this shortage.
The 55-fold Cost Difference: The Economic Logic of Flash Memory Replacement
According to Citrini Research's calculation, the cost per bit of flash memory is about 1/55 of that of DRAM - QLC NAND is about $0.05 per GB, DDR5 DRAM is about $2.75, and HBM3E is as high as $15.
The exploitable space of this price difference lies in the largest single memory consumption in AI inference - KV cache (recording the context of all previous tokens in each generation step of the model, which can grow to hundreds of GB in long conversations) - the requirement for read speed is much lower than that of the decoding path of model weights. For such sequentially read data, the speed advantage of DRAM is significantly narrowed, while the capacity advantage of flash memory is fully demonstrated.
The capacity expansion path of flash memory is also fundamentally different from that of DRAM. Flash memory increases capacity by vertically stacking more cell layers, relying on the existing deposition and etching processes in existing factories, without the need for new lithography nodes and without occupying EUV resources. Flash memory controllers are produced based on the mature 6/7 nanometer process, far from the bottleneck nodes that are restricting advanced processes.
The paper "LLM in a Flash" previously published by Apple researchers provides methodological support: by storing large language model parameters in the device's flash memory and retrieving them to DRAM on demand, models exceeding the DRAM capacity limit can be run on devices with limited DRAM capacity, and the inference speed can be 4 to 5 times and 20 to 25 times faster than the simple loading method on the CPU and GPU respectively.
Two Paths: Synchronous Evolution of Data Centers and Edge Devices
AMD's acquisition focuses on the data center scenario. By integrating MEXT technology into its data center product portfolio, AMD aims to help enterprise customers improve resource utilization efficiency and reduce costs in AI workload deployment.
The Shawn Kim team at Morgan Stanley believes that despite the continuous memory shortage, AMD still has a structural advantage in the cloud market competition landscape - "the demand for proxy AI-driven CPUs is structurally beneficial to the expansion of AMD's market share in the cloud." Citigroup's optimistic prediction of AMD is more based on its positive competition with NVIDIA in GPU sales.
Apple's path focuses on the edge side. The "LLM in a Flash" solution partially transfers the dependence of model inference on expensive cloud memory to the device's local flash memory, reducing cloud computing costs and providing a feasible memory architecture support for edge-side AI applications.
According to Citrini Research, the two paths lead to the same conclusion: the memory hierarchy of AI inference is being reconstructed. Low-frequency KV caches, model weights, and edge-side data will gradually move from high-priced HBM/DRAM to the NAND Flash/SSD layer, forming a multi-layer storage architecture.
This architectural change is creating a multi-layered transmission effect along the industry chain. According to Citrini Research, the most directly benefited layer is the NAND original manufacturers. High-capacity NAND, enterprise-level SSDs, and QLC NAND are the purest directions, including SanDisk, Western Digital, Micron, and Kioxia.
The SSD controller layer is considered to have the strongest sustainability - the key to making flash memory truly approach the memory experience lies in the optimization of the controller, firmware, and NVMe architecture, involving companies such as Silicon Motion and Marvell. The CXL/PCIe high-speed interconnection layer also benefits.
This article does not constitute personal investment advice and does not represent the platform's views. The market is risky, and investment should be made with caution. Please make independent judgments and decisions.
This article is from the WeChat official account "Wall Street News", author: Zhang Yaqi, published by 36Kr with authorization.