Continue to snipe at NVIDIA, AMD's bullets are loaded | Focus Analysis
Author | Qiu Xiaofen
Editor | Su Jianxun
On October 10, at the annual Advancing AI conference, the American chip giant AMD released a series of significant chip product updates, covering processors for AI PCs, EPYC processors, DPUs, and more.
At a time when Nvidia's latest Blackwell chip is facing delivery difficulties, how AMD's latest GPU product series (AMD Instinct MI325X) will play its cards is directly related to AMD's future and naturally becomes the focus of attention from all walks of life.
Chasing Nvidia closely in parameters has been AMD's consistent strategy for the MI series, and this AMD Instinct MI325X chip is no exception, choosing to closely benchmark Nvidia's previous-generation chip product H200.
However, the good news is that, from the product information, AMD is now gradually finding a strategy for differentiated competition - this AMD product is gradually focusing on improving memory and inference capabilities.
AMD Chairman and CEO Lisa Su releases the AMD Instinct MI325X series chips
First, the AMD Instinct MI325X is equipped with 256GB of HBM3E high-bandwidth memory, providing a memory bandwidth of 6TB/s - the parameters are much larger than Nvidia H200 (H200 is 141G and 4.8TB/s respectively).
Secondly, although the computing power of this AMD product in FP16 (16-bit floating-point number) is not as strong as Nvidia, its inference capability is 20%-40% higher than Nvidia H200 as a whole.
Putting a heavy focus on inference is a wise move. An industry insider told 36Kr that a major trend in computing power centers this year is that as some large model manufacturers gradually put down pre-training, the demand for inference and model fine-tuning has increased.
"For a certain computing power center customer, the ratio of pre-training to inference was 7:3 last year, and this year it has completely reversed." The demand changes of downstream large model and application manufacturers require upstream chip manufacturers to make timely strategic shifts.
However, a single-chip differentiated competition is far from enough. AMD has also started from the systematic level to make up for the shortcomings in connection and software ecology. And this is precisely where Nvidia has the highest barriers.
One of the advantages of Nvidia's products is that, relying on NV-Link, it makes multiple single chips still powerful when connected, without losing computing power due to chip interconnection and transmission. This time, AMD relies on their Infinity Fabric interconnect technology to make the multi-card effect stronger than individual operations.
According to the introduction, when 8 AMD Instinct MI325X are combined, compared to the same order of magnitude of Nvidia (that is, H200 HGX), the memory is 1.8 times that of Nvidia, the memory bandwidth is 1.3 times that of Nvidia, and the computing power is 1.3 times that of Nvidia.
In terms of software ecology, AMD is also continuously making up for its shortcomings. AMD's software platform ROCm, through continuous optimization and in-depth cooperation with multiple AI development platforms, not only does not hold back, but also improves the overall efficiency.
After AMD's actual measurement, when running the Meta Llama-2 model, the training efficiency of the AMD MI325X single card with the support of ROCm exceeds that of Nvidia H200. And if the 8-card cluster of AMD is used to run, the training efficiency is still equivalent to H200 HGX.
At the previous Computex Taipei, AMD Chairman and CEO Lisa Su has made it clear that the GPU product rhythm is in line with Nvidia, and it is necessary to "update once a year". In addition to releasing the Instinct MI325X series, AMD also casually revealed the situation of future products -
According to the introduction, AMD's next-generation chip Instinct MI350 series will be launched in the second half of next year, and it also continues the product logic of this generation, with the inference performance increased by 35 times, providing 288GB of HBM3E memory, and the peak computing power increased by 1.8 times, which is on par with Nvidia B200's computing power.
After gradually clarifying the product strategy and release rhythm, AMD has a great trend of making rapid progress in the data center field in 2024.
Lisa Su previously disclosed that AMD has won orders from hundreds of AI customers and OEM manufacturers. Its share in the data center server has also increased from the previously pitiful single digits to about 30% now.
Financial report data is the best illustration. In July, the information released by AMD showed that the data center business revenue in the second quarter of this year reached 2.8 billion US dollars. Although there is still a big gap compared with Nvidia (22.6 billion US dollars), it has increased by 115% year-on-year, and it is also the fastest-growing item in all AMD's businesses.
AMD's breakthrough in the data center field is actually the combined effect of multiple factors - in addition to the fact that AMD's previous-generation product (MI300 series) found the right strategy and became the best-selling product in AMD's history, it also needs to be superimposed on the overall explosion of the intelligent computing center market and a factor of the opponent's mistakes.
In the entire last year, Nvidia's GPU products were plagued by production capacity, and the delivery cycle reached an astonishing 8-11 months. The supply problem was not alleviated until the first quarter of 2024, but customers still need to wait for a long 3 months.
However, the good times did not last long. When Nvidia's H series finally entered the peak shipping period this year, its latest Blackwell series chip products have fallen into a new round of delivery problems.
Comprehensive multi-party information shows that Nvidia's new Blackwell series chips, which were originally scheduled for production in the third quarter of this year, due to chip design defects, resulting in insufficient stability, and encountering problems such as low packaging yield in the supply chain, the overall schedule has been postponed by another quarter.
When the opponent is continuously plagued by production and design problems, AMD's products naturally become the best choice to make up for the computing power gap.
However, Nvidia is also worried about missing the market opportunity and giving away the market share, and is also trying to get rid of the shadow of delays.
During the same period as AMD's conference, Morgan Stanley held a three-day non-trading roadshow for Nvidia. The information emphasized to investors at the roadshow is that - the problems of Blackwell have been solved, the demand is hot, and Nvidia " has sold all the chips within the next year."
Jensen Huang has also hinted at this point many times in public, suggesting that "this chip is the product that every customer wants to have the most, and everyone wants to be the first to receive the goods."
For AMD, this may not be good news. With the shadow of the opponent's production capacity dissipating, fortunately, AMD has also gradually found its own competitive rhythm. In 2025, the two chip giants will once again return to the frontal battlefield in the GPU field, and this is a crucial year to test the true comprehensive strength of both sides.
Extended Reading:
end