The world has long suffered from the DRAM shortage.
Currently, data centers are facing a new crisis - not a shortage of computing power, but the high cost of memory.
In recent years, with the large - scale and rapid expansion of AI applications such as large - model inference, in - memory databases, and high - performance computing, data centers are being pushed to the critical point of memory resources. DRAM, once a standard component of servers, has now become the most expensive and scarce infrastructure resource. The soaring prices and rigid supply have become the key factors restricting the deployment pace of AI computing power.
According to tracking data from Counterpoint Research, the price of 64GB DIMM memory increased 3.5 times between the third quarter of 2025 and the first quarter of 2026, and the upward trend has not yet peaked. It is expected that by the third quarter of 2026, the cumulative increase will reach 5 times.
Data from TrendForce is even more intuitive: In the first quarter of 2026, the quarterly increase in DRAM contract prices was as high as 93% to 98%, driving the overall revenue of the global DRAM industry to increase by 81% quarter - on - quarter, reaching $97 billion. In the second quarter, the upward trend continued, and the contract prices are expected to rise by another 58% to 63%.
The signals from the spot market are even more straightforward: The current spot unit price range of server - grade DDR5 RDIMM is $27 to $37 per GB. Just to build a 12TB memory pool, the pure DRAM hardware procurement cost is close to $500,000.
The DRAM Crisis: A Full - scale Outbreak
The root cause of this price - hike storm lies in the continuous encroachment of DRAM production capacity by HBM.
According to relevant data, with the explosive demand for high - bandwidth memory in AI training and inference, the proportion of HBM in DRAM wafer production capacity has climbed from 2% in 2020 to an estimated 25% in 2026. The three major original manufacturers, Samsung, SK Hynix, and Micron, have all tilted their high - quality production capacity towards HBM with high profit margins. From 2025 to 2027, the proportion of HBM wafer starts in the overall DRAM wafer starts is 18%, 22%, and about 30% respectively. One HBM wafer consumes about three times the production capacity of DDR5. The three major original manufacturers have actively cut low - margin orders for mobile phones and PCs and redirected their production capacity to AI. Considering that large - scale cloud providers have locked in future wafer outputs in advance through multi - year long - term contracts, the supply of standard DRAM for the server field has been further compressed.
The rigidity of the supply side means that the shortage is difficult to ease in the short term.
Advanced DRAM manufacturing processes highly rely on EUV lithography machines, with a single device costing up to about $200 million. The investment in a modern wafer factory often amounts to tens of billions of dollars. Even if everything goes smoothly, the construction period lasts for several years. The speed of capacity expansion lags far behind the growth of AI demand.
Jefferies predicts that if the influence of domestic manufacturers is not considered, the global storage bit supply will only grow by 7% to 8% in 2026. There may be a supply gap of about 150,000 to 200,000 wafers per month for DRAM and NAND combined. Micron Technology stated in its financial report for the third fiscal quarter of 2026 that even if the industry supply may gradually improve in 2028, it is still difficult to determine when the storage supply will catch up with the continuously growing demand.
In addition, the pressure has already spread from data centers to the consumer end.
Asha Sharma, the CEO of Xbox, publicly stated that the memory cost has increased by about five times in the past two years, directly causing the company to be unable to produce enough game consoles to meet market demand. Apple has also announced price increases for products such as the iPhone, Mac, and iPad.
The team of Shawn Kim, an analyst at Morgan Stanley, even said bluntly that the soaring memory prices and scarce supply are evolving into a comprehensive risk for the digital economy, "spreading from the bottleneck of AI infrastructure to hardware profit margins, device affordability, cloud costs, inflation, and even the policy level."
The changing proportion of DRAM in the server bill of materials illustrates the problem more clearly. In 2023, DRAM accounted for about 50% of the total server cost. By mid - 2026, this proportion had climbed to 60% to 90%, with an average of about 75%. The price of the CPU has not decreased, but against the backdrop of the sky - rocketing memory prices, the CPU price increase seems insignificant.
What's more ironic is that the actual utilization rate of the memory purchased at a high cost is not high. Measured data from large - scale manufacturers such as Meta shows that generally, only about half of the memory capacity in data centers carries active "hot data", while a large amount of cold data occupies expensive DRAM resources for a long time.
Facing the high cost and scarcity of DRAM, industry players are starting to find alternative ways - instead of simply piling up hardware, they are using technological means to reduce their reliance on DRAM.
AMD: AI - based Predictive Scheduling, Making Flash Memory "Invisible" as DRAM
AMD has chosen the lightest software - based approach.
In June 2026, AMD announced the acquisition of MEXT, a memory optimization manufacturer. The core goal was to introduce AI - driven memory tiering technology to move cold data from high - cost DRAM to low - cost NAND flash memory, achieving low - cost expansion of effective memory capacity.
It is reported that MEXT was founded in 2023, and its founding team has an impressive background. Gary Smerdon, the co - founder and CEO, was once the chief strategy and product officer of Fusion - io, a pioneer in the large - scale commercialization of flash storage. More than a decade ago, Apple and Meta Platforms were its major customers.
MEXT has introduced an AI - based memory tiering technology to address the bottleneck of memory efficiency. This technology can transfer low - frequency access data from expensive DRAM to NAND - type flash memory with a much lower per - unit capacity cost without affecting the operation of applications.
MEXT's core product is the Predictive Memory Engine, a fully software - based memory tiering solution. It continuously monitors the access patterns of applications at the memory page granularity and automatically migrates cold data with low - frequency access to NAND flash memory. The cost per bit of flash memory is only about 1/55 of that of DRAM. At the same time, it learns the access patterns of the workload through an AI model, predicts the data pages to be called, and prefetches them back to DRAM before the application makes a request, enabling software to read data as if directly accessing the main memory, thus ensuring that performance is not affected.
Image source: Nextplat
The entire mechanism is completely transparent to the operating system and upper - layer applications. There is no need to modify any business code, and no additional dedicated hardware is required. Deployment can be completed in just a few minutes.
Official data shows that this solution can increase the effective memory capacity of the system by 2 to 4 times and reduce the overall infrastructure cost by about 50%. In typical scenarios such as the Neo4j graph database, EDA simulation, and film and television rendering, a 1:1 configuration ratio of DRAM to flash memory can achieve about 95% of the throughput of a pure DRAM configuration, but with significantly reduced costs.
MEXT previously conducted comparative tests on Dell servers and AWS cloud instances:
Comparison chart of Dell computers/AWS with and without MEXT memory expansion (Image source: Nextplat)
The performance and cost - effectiveness of the Neo4j graph database when the memory - to - flash ratio is 1:1 and 1:3 with MEXT memory expansion already in use:
Image source: Nextplat
MEXT's approach is not revolutionary. Concepts such as memory tiering and moving cold data to cheaper storage media have existed for a long time. However, previous technologies failed to be widely implemented in data centers, mainly because the accuracy of prediction algorithms was insufficient. Once the prediction was inaccurate, when the program needed data, it had to move it back from flash memory to DRAM, and the resulting delay was unacceptable, causing significant performance loss.
MEXT's breakthrough lies in using an AI model for this task. Its Predictive Memory Engine continuously analyzes memory access patterns, uses AI to determine which data pages are most likely to be used next, and proactively moves data from flash memory back to DRAM before the application actually makes a request.
For AMD, this acquisition fills a crucial gap in its full - stack capabilities. In addition to its EPYC CPU, Instinct GPU, and ROCm software stack, the memory efficiency layer brought by MEXT enables AMD to provide customers with a complete solution from chips to data - flow scheduling. This not only helps customers reduce the total cost of ownership and reduce GPU idle time waiting for data but also strengthens its competitiveness in the AI infrastructure market.
On the day the acquisition news was announced, AMD's stock price rose nearly 7% during intraday trading, indicating the market's recognition of this approach.
Of course, it remains to be seen to what extent MEXT's technology can be implemented in AMD's data center products. The physical difference in latency between NAND flash memory and DRAM is an objective fact. Whether software - based AI prediction can truly bridge this gap depends on the actual performance after large - scale deployment.
Apple: On - device Large Models, "Storing" Models in Flash Memory
While data centers are struggling with DRAM costs, the consumer end also faces the same constraint. The DRAM capacity of mobile phones and other terminals is extremely limited, but they need to support the inference requirements of on - device large models. Apple's solution is to let large models reside in flash memory and load them into memory on demand.
Apple's latest AFM 3 Core Advanced is a 20 - billion - parameter on - device large model. If it were fully loaded into DRAM in the traditional way, it would far exceed the memory limit of consumer - grade devices. Apple has solved this problem through a sparse activation architecture: the entire model is stored in NAND flash memory. During inference, instead of loading all the weights, it selects the expert modules required for the current inference at once based on the input prompt words and only transfers a working set of 1 to 4 billion parameters to DRAM.
Schematic diagram of the AFM 3 Core Advanced model architecture
Different from the traditional MoE model, which switches experts token by token and causes frequent data transfer, Apple uses a routing mechanism based on the granularity of prompt words, combined with a high - proportion of shared experts that always reside in DRAM, significantly reducing the number of exchanges between flash memory and DRAM and minimizing the loading delay. Combined with optimizations such as instruction - level pruning (IFP) and Transformer layer streamlining, the peak DRAM usage of the 20 - billion - parameter model is finally controlled within the range of 2GB to 8GB, further balancing memory usage and computing efficiency. This effectively solves the problem of excessive DRAM usage when deploying MoE on devices, enabling it to run smoothly on terminal devices such as the iPhone and achieving on - device inference with "large models and small memory".
This architecture is not a product of a temporary research effort.
In fact, as early as 2024, Apple's research team published the paper "LLM in a Flash", systematically verifying the technical path of storing large - model parameters in flash memory and scheduling them on demand. While reducing cloud computing costs, it provides a feasible memory architecture for on - device AI applications, achieving inference speeds 4 to 5 times faster on CPUs and 20 to 25 times faster on GPUs compared to naive loading.
When the DRAM price increase spreads from the industrial end to consumer electronics, this solution not only supports the on - device AI experience but also reduces the device's reliance on high - capacity DRAM.
Overall, the two approaches of AMD and Apple are evolving simultaneously for data centers and end - devices respectively, but they point to the same conclusion: the memory hierarchy for AI inference is being reconfigured. Low - frequency KV caches, model weights, and end - device data will gradually move from high - cost HBM/DRAM to the NAND Flash/SSD layer, forming a multi - level storage architecture.
This architectural change is having a multi - level transmission effect along the industrial chain. According to Citrini Research, the most direct beneficiaries are NAND original manufacturers.
Marvell: Hardware Compression + CXL, Expanding Physical Memory Capacity
If AMD and Apple take the routes of software and architecture optimization, Marvell has chosen to make a breakthrough at the hardware level. Relying on the CXL high - speed interconnection protocol, it uses hardware inline compression technology to directly increase the equivalent capacity of physical DRAM.
In June 2026, Marvell released the Structera series of CXL controllers - Structera X (memory expansion controller) and Structera A (near - memory accelerator). Both chips are equipped with a self - developed CDB (Compression - Decompression Block) hardware compression module.
It is understood that when data is written to DRAM, the CDB module compresses it in real - time through a customized LZ4 lossless algorithm; when reading, it decompresses the data synchronously. The entire process is completed independently in the memory link, without occupying the host CPU's computing power, and is completely transparent to upper - layer applications. Depending on the data type, 1GB of physical DRAM can provide 2 to 3.64 times the equivalent logical capacity. In the mixed database business scenario, the average compression ratio can reach 3.64:1, which means that less than one - third of the physical memory can meet the same business requirements.
In addition, this solution has two cost - reduction benefits: First, it enables the reuse of old memory. The Structera X controller supports the access of DDR4 memory, allowing retired DDR4 memory to be incorporated into the CXL memory pool, reducing the need to purchase expensive new DDR5 memory. Second, it enables memory pooling. Through the CXL protocol, it breaks the exclusive access limit of a single CPU to memory, allowing multiple servers to share memory resources and digest the idle capacity in the system.
Based on the current spot price of DDR5 at $27 to $37 per GB, the DRAM hardware cost of a 12TB memory pool is close to $500,000. If estimated at a 3 - fold compression ratio, the physical DRAM procurement volume can be