CVAug 13, 2020

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

arXiv:2008.05856v1
Originality Incremental advance
AI Analysis

This addresses the problem of synthesizing realistic videos from text for applications in media and AI, representing an incremental improvement over existing methods.

The paper tackles video generation from text descriptions by proposing a recurrent deconvolutional generative adversarial network (RD-GAN) to address frame discontinuity and text-free limitations, achieving well performance in tasks like conditional video generation and video prediction.

This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes