MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
This work addresses video generation for AI and multimedia applications, offering an incremental improvement by integrating motion learning into existing image generators.
The paper tackles the problem of unrealistic videos from separate content and motion generation by introducing MotionVideoGAN, a framework that learns a motion space from image pairs to ensure content consistency, achieving state-of-the-art performance on the UCF101 dataset.
Video generation has achieved rapid progress benefiting from high-quality renderings provided by powerful image generators. We regard the video synthesis task as generating a sequence of images sharing the same contents but varying in motions. However, most previous video synthesis frameworks based on pre-trained image generators treat content and motion generation separately, leading to unrealistic generated videos. Therefore, we design a novel framework to build the motion space, aiming to achieve content consistency and fast convergence for video generation. We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators. Firstly, we propose an image pair generator named MotionStyleGAN to generate image pairs sharing the same contents and producing various motions. Then we manage to acquire motion codes to edit one image in the generated image pairs and keep the other unchanged. The motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs. Finally, we introduce a latent code generator to produce latent code sequences using motion codes for video generation. Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.