CVLGIVApr 4, 2020

Temporal Shift GAN for Large Scale Video Generation

arXiv:2004.01823v210 citations
AI Analysis

This addresses the challenge of generating realistic videos for applications in entertainment and simulation, though it appears incremental by building on existing 2D architectures.

The paper tackles the problem of video generation by proposing a network architecture that improves spatio-temporal consistency without using costly 3D models, achieving state-of-the-art inception scores on UCF-101 and introducing a new quantitative measure and dataset.

Video generation models have become increasingly popular in the last few years, however the standard 2D architectures used today lack natural spatio-temporal modelling capabilities. In this paper, we present a network architecture for video generation that models spatio-temporal consistency without resorting to costly 3D architectures. The architecture facilitates information exchange between neighboring time points, which improves the temporal consistency of both the high level structure as well as the low-level details of the generated frames. The approach achieves state-of-the-art quantitative performance, as measured by the inception score on the UCF-101 dataset as well as better qualitative results. We also introduce a new quantitative measure (S3) that uses downstream tasks for evaluation. Moreover, we present a new multi-label dataset MaisToy, which enables us to evaluate the generalization of the model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes