Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis
This work addresses the challenge of conditioned image-to-video synthesis for computer vision applications, representing an incremental improvement over existing GAN-based methods.
The paper tackled the problem of generating videos with specific content and motion patterns by proposing Dual-MTGAN, which learns disentangled representations to perform deterministic motion transfer and stochastic motion generation from images, achieving effectiveness and robustness as confirmed by quantitative and qualitative results.
Generating videos with content and motion variations is a challenging task in computer vision. While the recent development of GAN allows video generation from latent representations, it is not easy to produce videos with particular content of motion patterns of interest. In this paper, we propose Dual Motion Transfer GAN (Dual-MTGAN), which takes image and video data as inputs while learning disentangled content and motion representations. Our Dual-MTGAN is able to perform deterministic motion transfer and stochastic motion generation. Based on a given image, the former preserves the input content and transfers motion patterns observed from another video sequence, and the latter directly produces videos with plausible yet diverse motion patterns based on the input image. The proposed model is trained in an end-to-end manner, without the need to utilize pre-defined motion features like pose or facial landmarks. Our quantitative and qualitative results would confirm the effectiveness and robustness of our model in addressing such conditioned image-to-video tasks.