It's time for Tsinghua-affiliated DeepSeek. Silicon Valley is in a frenzy. With 200x acceleration on a single card, videos enter the sub-second era.
[Introduction] The "DeepSeek Moment" in the field of video generation has arrived! Tsinghua University has open-sourced TurboDiffusion, pulling AI video generation from the "minute-level" into the "second-level" real-time era. A 200x acceleration on a single card enables ordinary graphics cards to produce high-quality videos!
Just now, another DeepSeek Moment has come to the AI circle!
The TSAIL Laboratory of Tsinghua University, in collaboration with Shengshu Technology, has grandly released and open-sourced the video generation acceleration framework TurboDiffusion.
As soon as this framework was introduced, it immediately sparked heated discussions in the global AI community. Researchers and engineers from multiple institutions and open-source communities such as OpenAI, Meta, and vLLM have all given their thumbs up and shared it.
Why has TurboDiffusion caused such a big reaction?
To sum it up in one sentence: Without significantly affecting the generation quality, it has directly skyrocketed the video generation speed by 100–200 times!
From this moment on, AI video has officially entered the "real-time generation" era from the "minute-level" generation!
What is TurboDiffusion? Why is it so powerful?
With the development of large AI models, video generation is becoming one of the most important AI content creation directions after images and text.
However, in reality, we often find that although the model performance is strong, the generation is very slow!
Even with a top-of-the-line GPU like the H100, it still takes several minutes to generate a short video without acceleration, which seriously affects the experience of practical applications. Moreover, most creators only have consumer-grade graphics cards like the RTX 5090 or 4090.
Therefore, whether it is possible to significantly accelerate the generation process without sacrificing quality has become the key to whether AI video can enter the daily creative process.
At this time, the birth of TurboDiffusion is very timely.
Github: https://github.com/thu-ml/TurboDiffusion
Technical report: https://jt-zhang.github.io/files/TurboDiffusion_Technical_Report.pdf
Recently, Tsinghua University, in collaboration with Shengshu Technology, has open-sourced the video generation acceleration framework TurboDiffusion.
It is a tool specifically designed for accelerating Diffusion models, especially good at handling video generation scenarios.
Its appearance is like a turbo engine. On a single RTX 5090 graphics card, it can achieve a 100 - 200x speed increase.
Whether it is generating videos from images (I2V) or from text (T2V), it can handle them efficiently.
Even in the generation of high-resolution and long-duration videos, it can still maintain amazing acceleration performance.
Actual test: It can run any large model at high speed
The amazing performance of TurboDiffusion is not just theoretical data. The actual test acceleration effects on multiple video generation models are astonishing.
The following figure shows the powerful generation effect of TurboDiffusion.
Taking the generation of a 5-second video by a 1.3B model as an example, it takes 184 seconds to generate this 5-second video using the standard official implementation.
The video content generated by TurboDiffusion has no obvious visual difference, but it only takes 1.9 seconds.
This means that under the same conditions, the TurboDiffusion framework completes the generation in only 1.9 seconds, with a speed increase of about 97 times.
For the image-to-video generation of a cat taking a selfie, a 14B image-to-video model is used. To generate a 5-second 720P resolution video, although the picture quality is good, it takes an extremely long time (4549s, more than 1 hour) using the official standard implementation, which is difficult to meet any real-time or interactive scenarios.
With the acceleration of TurboDiffusion, elements such as an underwater selfie and a cat surfing with sunglasses are fully retained in the video, and the generation time is only 38 seconds. That is, for a 14B image-to-video model to generate a 5-second 720P video, TurboDiffusion can achieve an almost lossless end-to-end acceleration of 119 times on a single RTX 5090.
After acceleration, the change in video picture quality is minimal, but the speed increase is as high as about 120 times (4549s → 38s)!
This shows that even in the scenario of a super-large model + high resolution + image-to-video generation, TurboDiffusion can still bring an order-of-magnitude inference acceleration effect.
For a 14B text-to-video model to generate a 5-second 720P resolution video, TurboDiffusion can achieve an almost lossless end-to-end acceleration of 200 times on a single RTX 5090.
Even more amazingly, using the technology included in TurboDiffusion on the Vidu model can also achieve extremely high inference acceleration without losing video generation quality.
For example, generating an 8-second, 1080P high-definition video on the Vidu model originally took 900 seconds. After using TurboDiffusion, it only takes 8 seconds, truly achieving "what you see is what you get"!
Unveiling the four core cutting-edge technologies
The reason why TurboDiffusion can run so fast is due to the support of the following four cutting-edge technologies:
1. SageAttention: Low-bit quantization attention acceleration
In high-resolution video scenarios, the traditional Transformer attention layer has huge computational overhead. TurboDiffusion uses the self-developed SageAttention technology from Tsinghua University to accelerate low-bit quantization attention, fully squeezing the performance of the graphics card and achieving extreme speed improvement.
GitHub link: https://github.com/thu-ml/SageAttention
2. Sparse-Linear Attention (SLA): Sparse attention acceleration
In terms of sparse computation, TurboDiffusion introduces SLA (Sparse-Linear Attention).
Since sparse computation and low-bit Tensor Core acceleration are orthogonal, SLA can be built on top of SageAttention, significantly reducing the redundant computation of fully connected matrix multiplication and obtaining several times of additional acceleration during the inference process.
GitHub link: https://github.com/thu-ml/SLA
3. rCM step distillation acceleration: Generation with fewer steps
rCM from the NVIDIA open-source laboratory is an advanced step distillation method. Through training, it enables a small number of sampling steps to restore the same quality as the original model.
This method can accelerate step distillation, reduce the "diffusion steps" during the inference process, and reduce latency without losing picture quality.
For example, the original Diffusion requires 50–100 steps, while rCM can compress it to 4 - 8 steps.
GitHub link: https://github.com/NVlabs/rcm
4. W8A8 INT8 quantization: Linear layer acceleration
TurboDiffusion uses the W8A8 INT8 quantization strategy in the linear layer. In this way, the model weights and activations are mapped to the 8-bit integer space, and block quantization is performed at a block granularity of 128×128, taking into account both speed and accuracy, and significantly reducing the inference power consumption and memory usage.
These 4 core technologies are all independently developed by the TSAIL team of Tsinghua University in collaboration with Shengshu Technology, and they have milestone value and far-reaching influence on the technological breakthrough and industrial implementation of AI multi-modal large models. Among them, SageAttention is the world's first technical solution to achieve quantization acceleration for attention computation and has been widely deployed and applied in the industry.
For example, SageAttention has been successfully integrated into the NVIDIA inference engine Tensor RT and has also been deployed and implemented on mainstream GPU platforms such as Huawei Ascend and Moore Threads S6000. In addition, leading domestic and foreign technology companies and teams such as Tencent Hunyuan, ByteDance Doubao, Alibaba Tora, Shengshu Vidu, Zhipu Qingying, Baidu PaddlePaddle, Kunlun Wanwei, Google Veo3, SenseTime, and vLLM have all applied this technology in their core products, creating considerable economic benefits with its excellent performance.
How to get started?
TurboDiffusion is very convenient to use. The model parameters (Checkpoints) for image-to-video and text-to-video generation with efficient inference code have all been open-sourced.
Because it is easy to use, even if you are not an expert in model training, you can generate videos with one click:
1. Install the Python package in the TurboDiffusion repository
Address: https://github.com/thu-ml/TurboDiffusion
2. Download the Checkpoints of the corresponding model (supporting image-to-video/text-to-video generation), such as TurboWan2.1-T2V-14B-720P.
3. Call the