Lisa Su invested in an AI unicorn that only sells AMD computing power.
On May 20th, XinDongXi reported that according to Forbes' report yesterday, Zyphra, a US AI model developer, is conducting a new round of financing worth $500 million (approximately RMB 3.4 billion), and AMD, a US chip giant, has also participated in the investment. Sources said that Zyphra's valuation will reach at least $5 billion (approximately RMB 34.1 billion).
Zyphra was founded in 2020. It develops advanced open - source AI models and provides cloud infrastructure services. Most AI labs default to using NVIDIA chips, while Zyphra is completely on the AMD side. Its model training and inference both run on AMD hardware, which not only saves costs but also brings supply - chain advantages.
Zyphra Cloud is a new full - stack cloud platform supported by AMD technology. It is designed for AI - native startups, enterprises, and cutting - edge AI hyperscale data centers.
The platform was initially based on the AMD MI355X GPU, supporting serverless inference of leading open - source models. It has now expanded to bare - metal AMD infrastructure, offering two main deployment modes: on - demand bare - metal GPU clusters for flexible workloads; and customized hyperscale AMD infrastructure for large - scale training and inference deployments.
Currently, the 15 - megawatt capacity of MI355 is in use.
In Zyphra Cloud, Zyphra Inference provides production - grade model services. It is designed for large MoE models and long - running agent workloads with long contexts, large KV, and prefix caches. Supported by the cooperation of MI355X GPU and TensorWave, it can handle advanced open - source models such as Kimi - K2.6, DeepSeek - V3.2, and GLM - 5.1. The prices of its model services are as follows:
Yesterday, Zyphra announced that it would release the first end - to - end benchmark test results of its inference based on MI355X, and said that its inference optimization was significantly better than the AMD baseline, and it narrowed the performance gap between MI355X and B200 when running models such as Kimi K2.6, GLM - 5.1, and DeepSeek - V3.2.
Its optimizations include:
Tree - based attention: A balanced tree - based simplification for long - context attention
TSP: Maintaining model parallel groups on intra - node links
Tuning across kernels, HIP graphs, and RCCL
EAGLE speculative decoding adjusted for ROCm
A longer context time can better leverage its advantages. As the context time increases, the TSP and tree - based attention mechanisms can bring greater performance improvements, thus narrowing the gap with B200. In terms of single - request decoding and TTFT, B200 currently still leads Zyphra's technology stack and the AMD baseline, but Zyphra sees a way to narrow the gap.
Zyphra explained why it chose MI355X: Each GPU is equipped with 288GB HBM3E, while the B200 is equipped with 180GB. This means more resident KV and prefix caches, larger models, longer contexts, lower latency, and higher throughput.
Compared with B200, the single - node HBM memory budget of MI355X is approximately doubled for GLM 5.1, DeepSeek - V3.2, and DeepSeek - V4 - Pro (to be launched).
Next, Zyphra plans to support DeepSeek - V4 - Pro, expand to 1.6T parameters and 1M context, work on training information quantization, diffusion - based speculators, and service engines.
The company also plans to extend its support to the next - generation AMD platforms, including the MI450 series and subsequent products.
This article is from the WeChat official account “XinDongXi”, written by ZeR0 and published by 36Kr with authorization.