CVLGJun 1, 2022

Cascaded Video Generation for Videos In-the-Wild

arXiv:2206.00735v1h-index: 94
Originality Incremental advance
AI Analysis

This work addresses the problem of generating high-resolution, multi-frame videos efficiently for applications in computer vision, though it is incremental as it builds on existing coarse-to-fine approaches.

The paper tackles video generation by proposing a cascaded model that first creates low-resolution videos to establish global scene structure and then refines them at higher resolutions, achieving competitive results on UCF101 and Kinetics-600 and scaling to generate 256x256 pixel videos with 48 frames on BDD100K.

Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each cascade level sequentially on partial views of the videos, which reduces the computational complexity of our model and makes it scalable to high-resolution videos with many frames. We empirically validate our approach on UCF101 and Kinetics-600, for which our model is competitive with the state-of-the-art. We further demonstrate the scaling capabilities of our model and train a three-level model on the BDD100K dataset which generates 256x256 pixels videos with 48 frames.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes