LPDDR erlebt ein Comeback: Warum setzen KI - Inferenzchips auf eine neue Speicherarchitektur?

Warum setzen KI-Inferenzchips kollektiv auf LPDDR?

For a long time, LPDDR (Low Power Double Data Rate Synchronous Dynamic Random Access Memory) has mainly been used in power - saving consumer electronics devices such as smartphones and thin and light laptops. However, in recent years, LPDDR has rapidly spread into the field of data centers and is increasingly becoming the common choice for AI - inference chips in all scenarios from the edge to the cloud.

Both international large - scale enterprises and Chinese companies are increasingly relying on LPDDR as a memory solution for their inference products. Behind this decision are various considerations regarding cost, power consumption, and performance.

LPDDR Becomes the "General Solution" for AI Inference

Currently, LPDDR is penetrating from consumer electronics into the market for AI chips. It is not only the main memory option for inference GPUs but also extends to AI - specific CPUs, desktop AI supercomputers, and other categories.

Data Centers

Qualcomm AI200/AI250 is currently one of the most aggressive LPDDR data - center systems. As Qualcomm's first data - center inference system, the Qualcomm AI200 accelerator card offers a solution specifically developed for rack - based AI inference. Its goal is to provide low total operating costs and optimized performance for the inference of Large Language Models (LLM) and Multimodal Models (LMM), as well as other AI workloads. Each accelerator card supports 768 GB of LPDDR memory, enabling higher memory capacity and lower costs and offering excellent scalability and flexibility for AI inference. This solution is expected to enter the market in 2026. The AI250 accelerator, planned for 2027, builds on the advantages of the core architecture of the AI200 and brings important technological improvements.

Intel has introduced the first data - center GPU based on the Xe3P architecture, with the code - name "Crescent Island". This GPU is specifically optimized for AI inference and agent workloads and uses 480 GB of LPDDR5 memory. The TDP is only 350 W. Since HBM is not used, it can be directly used in existing air - cooled data centers without the need for liquid cooling.

The Qwang S3 of Xiwàng Technology is a representative of Chinese chips. As a Chinese GPGPU with LPDDR6 graphics memory, Xiwàng claims that the inference cost - performance ratio can be improved by more than ten times and the cost per token can be reduced by 90%.

Edge AI

Edge AI is the most established area for LPDDR.

DEEPX is a South Korean semiconductor company that focuses on edge computing and edge AI. Recently, the CEO of DeepX explained that the company will integrate the LPDDR - PIM in - memory computing solution into its products. PIM integrates a special data processor directly into the DRAM, allowing parts of the data processing to be transferred from the host processor to the memory. This reduces data movement and improves the energy efficiency and data processing efficiency of the AI accelerator system. Samsung Electronics is the only provider of LPDDR5X - PIM. Therefore, DeepX will combine its 2 - nm edge - AI chip DX - M2 with a computing power of 80 TOPS with the LPDDR5X - PIM solution of Samsung Electronics. The subsequent DX - M3 will also be equipped with the JEDEC - standardized LPDDR6 - PIM.

As a supplier, Longsys has developed two LPDDR memory solutions specifically for edge AI inference, namely AIDIMM (socket version) and AILPBGA (soldering version). The AIDIMM uses four LPDDR5x modules in a coplanar arrangement and offers a maximum capacity of 128 GB, a 256 - bit bus width, and an ultra - high bandwidth of 307.2 GB/s per channel. The AILPBGA uses its own technology standard and an innovative architecture. It has a native 256 - bit bus width, a bandwidth of up to 307 GB/s, and a capacity of 24 GB to 64 GB. It is fully compatible with the LPDDR standard interfaces and is manufactured in a compact BGA1764 package of 22 x 22 mm.

Desktop and AI - PC

Besides for special AI - inference chips, LPDDR also defines the performance limits of AI - PCs through the "unified memory architecture".

AMD Strix Halo (Ryzen AI MAX series) uses a 256 - bit LPDDR5X interface and supports up to 128 GB of LPDDR5X - 8000 memory with a bandwidth of 256 GB/s. The more aggressive Apple M4 Max uses a 512 - bit LPDDR5X memory bus (32 x 16 - bit controller) and achieves a bandwidth of 546 GB/s through the Memory - on - Package (MoP) package. It will become the reference for the local execution of Large Language Models. NVIDIA is also not to be outdone. Its recently introduced RTX Spark Super - Chip is designed for thin and light notebooks and small desktop computers. It integrates an RTX GPU with the Blackwell architecture and a 20 - core Grace CPU and has up to 128 GB of unified LPDDR5X memory. The AI computing power reaches 1 PFLOP, enabling high performance with low power consumption and small size.

In addition, NVIDIA has introduced the "world's most powerful desktop AI supercomputer" DGX Station for Windows. The DGX Station is supported by a GB300 Grace Blackwell Ultra Desktop super - chip. Through the NVIDIA NVLink - C2C connection, the Blackwell Ultra GPU is connected to a 72 - core Grace CPU. It is equipped with up to 748 GB of coherent memory and an FP4 performance of up to 20 Petaflops. The 748 GB of coherent memory consists of 496 GB of LPDDR5X (396 GB/s) CPU memory and 252 GB of HBM3e GPU graphics memory. Through the NVLink - C2C connection, a high - bandwidth coherent data exchange between the CPU and GPU is enabled.

Why LPDDR?

The full name of LPDDR is Low Power Double Data Rate SDRAM (Low Power Double Data Rate Synchronous Dynamic Random Access Memory). As the name suggests, everything is developed around power saving. It is usually soldered directly in the form of a chip on the motherboard of a mobile phone or an ultra - thin laptop and is in close proximity to the processor. It is almost non - replaceable. Why do all manufacturers choose LPDDR memory without prior consultation?

First, the cost of LPDDR is lower and the supply is more secure. HBM consists of several DRAM chips that are vertically stacked to form a 3D structure. Each layer is connected via TSV silicon through - holes and micro - contacts. As the number of stackings increases, the yield decreases exponentially. Due to the complex 3D stacking process and the limited yield, there has long been a supply bottleneck for HBM. In contrast, LPDDR is based on an established planar DRAM process and the mass production of consumer electronics. The cost per unit capacity is much lower than that of HBM, which can significantly reduce the hardware investment and the total operating costs of AI - inference servers. It is particularly suitable for inference scenarios with mass deployment.

Second, LPDDR has a higher capacity. A single card can load larger models and the context length can be longer. The 768 GB of LPDDR in the Qualcomm AI200 is the largest capacity in the industry. In comparison, the NVIDIA GB300 has only 288 GB of HBM3e per GPU and the AMD MI450X has 432 GB of HBM4. Since HBM is mainly connected to the processor via a silicon interposer in a 2.5D package and must be placed in close proximity to the core, the maximum total capacity of a system is limited. The fixed width and the interface position of the PHY also limit the layout options. In contrast, the IP ecosystem of the LPDDR5X controller is established and the access cost rate is low. The PCB/substrate layout and the signal - training process are highly standardized. System expansion is also extremely flexible. If bandwidth is required, this can be achieved through multi - channel parallelism. If capacity is required, the number of pins can simply be increased.

In addition, LPDDR has lower power consumption. Taking the Intel Crescent Island as an example, it can be seen that by choosing LPDDR instead of HBM, the power consumption can be reduced to 350 W. This means that it can be directly operated in existing air - cooled data centers without the need for liquid cooling. This also saves the investment in data - center liquid cooling and shortens the construction time.

Finally, the requirement for high memory bandwidth in inference is lower than in the training phase. In the training phase, the back - propagation of large amounts of data requires extremely high memory bandwidth, making HBM indispensable. In the inference phase, however, the model parameters are fixed, and the focus is on large memory capacity and efficient search. The capacity and cost advantages of LPDDR far outweigh the disadvantages in terms of bandwidth. At this year's GTC conference, Huang Renxun explained that the turning point for the AI - inference market has been reached and AI is moving from the training phase to the inference and execution phase. The demand for inference computing power is increasing exponentially. Compared with traditional training chips, the development of inference chips places more emphasis on power - consumption control, cost - efficiency, and installation flexibility. LPDDR clearly has advantages here.

The fact that chip giants all rely on LPDDR is not a coincidence but an adaptation of the entire industry. Some organizations have stated that by 2030, the number of inference workloads will be 100 times that of training workloads. The use of LPDDR in AI chips for data centers is a clear decision aimed at inference. In view of the supply bottleneck and high prices of HBM, manufacturers such as Qualcomm and Intel have developed a differentiated strategy with LPDDR, which is particularly suitable for scenarios such as LLM inference, video analysis, and recommendation systems, where the bandwidth requirements are relatively controllable and the capacity requirements are extremely high. Of course, the LPDDR solution also has its costs. Compared with HBM, it has a lower memory bandwidth, higher latency due to the narrower interface, and unproven reliability in 24 - hour high - temperature server environments.

Explosion of LPDDR Demand

The direct consequence of manufacturers' decision to switch to LPDDR is an exponential growth in LPDDR demand.

Take the example of the Qualcomm AI200. A single rack can contain several dozen accelerator cards, each with 768 GB of memory. The total memory capacity can amount to several dozen terabytes. This is equivalent to the memory consumption of hundreds of thousands or even millions of smartphones. And this is just one product of one company. If Qualcomm, Intel, NVIDIA, and other potential competitors (such as AMD and Broadcom) start mass - producing LPDDR solutions from 2026 - 2027, the demand for LPDDR will increase exponentially.

Some analysts have stated that the LPDDR consumption of the NVIDIA Vera Rubin AI server will increase from 3.144 billion GB in 2026 to 6.041 billion GB in 2027 and will account for 36% of the global total LPDDR supply capacity in 2027. This value will exceed the sum of the consumption of Apple (2.966 billion GB) and Samsung (2.724 billion GB) for the first time.

Vera Rubin is NVIDIA's latest AI platform, announced for mass production at the GTC 2026. It consists of a Vera CPU and a Rubin GPU. The Vera CPU is equipped with 88 custom ARMv9.2 "Olympus" cores and the system supports up to 1.5 TB (about 1536 GB) of LPDDR5X memory. It uses the new SOCAMM2 package. In comparison, a flagship smartphone usually only has 12 - 16 GB of LPDDR5X memory. This means that the LPDDR capacity of a single Vera CPU is about 90 times that of a smartphone.

AMD is also driving up the LPDDR demand. The sixth - generation EPYC server processor "Verano", planned for 2027, will support the SOCAMM2 form of LPDDR5X memory for the first time.

The supply logic of LPDDR is completely different from that of HBM. HBM is monopolized by the three giants SK Hynix, Samsung, and Micron, and the production capacity is severely limited. In contrast, LPDDR has a larger production capacity in consumer electronics and a more established supply chain. If the AI giants all rely on LPDDR, the production capacity of LPDDR will become a key factor for the expansion of AI infrastructure.

LPDDR6: From "Sufficient" to "Well - Utilizable"

The JEDEC Solid State Technology Association officially presented the LPDDR6 roadmap in April 2026.

The most remarkable thing about this roadmap is: The capacity of a single LPDDR6 memory chip could reach up to 512 GB. This capacity specification directly exceeds the current standard server DDR5. Currently, the capacity of a single DDR5 module in servers is usually between 64 GB and 128 GB. The difference in the capacity of a single die is even greater. Such an extreme... (The text seems to be incomplete here)

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

LPDDR erlebt ein Comeback: Warum setzen KI-Inferenzchips auf eine neue Speicherarchitektur?

LPDDR Becomes the "General Solution" for AI Inference

Why LPDDR?

Explosion of LPDDR Demand

LPDDR6: From "Sufficient" to "Well - Utilizable"