CV AI LG NEJun 7, 2022

Generating Long Videos of Dynamic Scenes

Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

BerkeleyNVIDIA

arXiv:2206.03429v231.5144 citationsh-index: 111

Originality Incremental advance

AI Analysis

This addresses the challenge of maintaining long-term consistency in video generation for applications like simulation and media production, though it is incremental in improving existing generative models.

The paper tackles the problem of generating long videos with realistic object motion, camera viewpoint changes, and new content over time, achieving improved temporal consistency and dynamics compared to existing methods.

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.

View on arXiv PDF

Similar