CVAIMar 23

Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

arXiv:2603.2186466.3h-index: 7
AI Analysis

This addresses the computational bottleneck for deploying video generation models, though it appears incremental as it builds on existing distillation techniques.

The paper tackles the problem of computational inefficiency in video generation by proposing a novel distillation framework for video diffusion models, which achieves stable few-step synthesis and significantly enhances perceptual fidelity and motion realism on VBench and VBench2 benchmarks.

Video generation has recently emerged as a central task in the field of generative AI. However, the substantial computational cost inherent in video synthesis makes model distillation a critical technique for efficient deployment. Despite its significance, there is a scarcity of methods specifically designed for video diffusion models. Prevailing approaches often directly adapt image distillation techniques, which frequently lead to artifacts such as oversaturation, temporal inconsistency, and mode collapse. To address these challenges, we propose a novel distillation framework tailored specifically for video diffusion models. Its core innovations include: (1) an adaptive regression loss that dynamically adjusts spatial supervision weights to prevent artifacts arising from excessive distribution shifts; (2) a temporal regularization loss to counteract temporal collapse, promoting smooth and physically plausible sampling trajectories; and (3) an inference-time frame interpolation strategy that reduces sampling overhead while preserving perceptual quality. Extensive experiments and ablation studies on the VBench and VBench2 benchmarks demonstrate that our method achieves stable few-step video synthesis, significantly enhancing perceptual fidelity and motion realism. It consistently outperforms existing distillation baselines across multiple metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes