ArrowGAN : Learning to Generate Videos by Learning Arrow of Time
This work addresses the challenge of generating realistic videos, which is a significant problem for researchers and developers in computer vision and generative AI, by introducing a novel self-supervisory task.
This paper introduces ArrowGAN, a framework for video generation that uses an auxiliary task where discriminators classify the 'arrow of time' to help generators synthesize forward-running videos. By integrating this with recent conditional image generation techniques, ArrowGAN achieves state-of-the-art performance in categorical video generation, improving video inception score and Frechet video distance across Weizmann, UCFsports, and UCF-101 datasets.
Training GANs on videos is even more sophisticated than on images because videos have a distinguished dimension: time. While recent methods designed a dedicated architecture considering time, generated videos are still far from indistinguishable from real videos. In this paper, we introduce ArrowGAN framework, where the discriminators learns to classify arrow of time as an auxiliary task and the generators tries to synthesize forward-running videos. We argue that the auxiliary task should be carefully chosen regarding the target domain. In addition, we explore categorical ArrowGAN with recent techniques in conditional image generation upon ArrowGAN framework, achieving the state-of-the-art performance on categorical video generation. Our extensive experiments validate the effectiveness of arrow of time as a self-supervisory task, and demonstrate that all our components of categorical ArrowGAN lead to the improvement regarding video inception score and Frechet video distance on three datasets: Weizmann, UCFsports, and UCF-101.