GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation
This addresses the bottleneck of slow video generation for AI applications, though it is an incremental improvement over existing distillation methods.
The paper tackled the high computational cost of diffusion models in video generation by proposing Guided Progressive Distillation (GPD), which reduced sampling steps from 48 to 6 while maintaining competitive visual quality on VBench.
Diffusion models have achieved remarkable success in video generation; however, the high computational cost of the denoising process remains a major bottleneck. Existing approaches have shown promise in reducing the number of diffusion steps, but they often suffer from significant quality degradation when applied to video generation. We propose Guided Progressive Distillation (GPD), a framework that accelerates the diffusion process for fast and high-quality video generation. GPD introduces a novel training strategy in which a teacher model progressively guides a student model to operate with larger step sizes. The framework consists of two key components: (1) an online-generated training target that reduces optimization difficulty while improving computational efficiency, and (2) frequency-domain constraints in the latent space that promote the preservation of fine-grained details and temporal dynamics. Applied to the Wan2.1 model, GPD reduces the number of sampling steps from 48 to 6 while maintaining competitive visual quality on VBench. Compared with existing distillation methods, GPD demonstrates clear advantages in both pipeline simplicity and quality preservation.