CVAILGNEJun 7, 2022

Generating Long Videos of Dynamic Scenes

BerkeleyNVIDIA
arXiv:2206.03429v2144 citationsh-index: 111
AI Analysis

This addresses the challenge of maintaining long-term consistency in video generation for applications like simulation and media production, though it is incremental in improving existing generative models.

The paper tackles the problem of generating long videos with realistic object motion, camera viewpoint changes, and new content over time, achieving improved temporal consistency and dynamics compared to existing methods.

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes