Temporal Generative Adversarial Nets with Singular Value Clipping
This work addresses video generation for AI and multimedia applications, representing an incremental improvement over existing GAN-based methods.
The paper tackles the problem of generating videos from unlabeled data by proposing Temporal Generative Adversarial Nets (TGAN), which uses separate temporal and image generators to improve video generation, resulting in effective performance as demonstrated in experiments.
In this paper, we propose a generative model, Temporal Generative Adversarial Nets (TGAN), which can learn a semantic representation of unlabeled videos, and is capable of generating videos. Unlike existing Generative Adversarial Nets (GAN)-based methods that generate videos with a single generator consisting of 3D deconvolutional layers, our model exploits two different types of generators: a temporal generator and an image generator. The temporal generator takes a single latent variable as input and outputs a set of latent variables, each of which corresponds to an image frame in a video. The image generator transforms a set of such latent variables into a video. To deal with instability in training of GAN with such advanced networks, we adopt a recently proposed model, Wasserstein GAN, and propose a novel method to train it stably in an end-to-end manner. The experimental results demonstrate the effectiveness of our methods.