Video Imagination from a Single Image with Transformation Generation
This work addresses a challenging task in video synthesis for applications like content creation, but it appears incremental as it builds on existing transformation and adversarial methods.
The paper tackles the problem of generating multiple imaginary videos from a single image by addressing high dimensionality and motion ambiguity, achieving promising performance in image quality assessment with diverse five-frame videos in acceptable perceptual quality.
In this work, we focus on a challenging task: synthesizing multiple imaginary videos given a single image. Major problems come from high dimensionality of pixel space and the ambiguity of potential motions. To overcome those problems, we propose a new framework that produce imaginary videos by transformation generation. The generated transformations are applied to the original image in a novel volumetric merge network to reconstruct frames in imaginary video. Through sampling different latent variables, our method can output different imaginary video samples. The framework is trained in an adversarial way with unsupervised learning. For evaluation, we propose a new assessment metric $RIQA$. In experiments, we test on 3 datasets varying from synthetic data to natural scene. Our framework achieves promising performance in image quality assessment. The visual inspection indicates that it can successfully generate diverse five-frame videos in acceptable perceptual quality.