CVAINov 23, 2022

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Tsinghua
arXiv:2211.13221v2413 citationsh-index: 47
Originality Incremental advance
AI Analysis

This work addresses the problem of realistic long video generation for AI content creation, representing a domain-specific advancement with incremental improvements over existing diffusion-based approaches.

The paper tackles the challenge of generating high-quality long videos by introducing lightweight video diffusion models that operate in a low-dimensional 3D latent space, achieving significant improvements in visual quality and length (over 1,000 frames) while reducing computational costs compared to previous methods.

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length. Extensive experiments on small domain datasets of different categories suggest that our framework generates more realistic and longer videos than previous strong baselines. We additionally provide an extension to large-scale text-to-video generation to demonstrate the superiority of our work. Our code and models will be made publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes