One Style is All you Need to Generate a Video
This work addresses video generation and editing for applications in media and entertainment, offering a novel approach to motion transfer without landmarks.
The paper tackles video generation by proposing a style-based conditional model that learns disentangled dynamic and content representations, enabling independent manipulation and transfer of motion between different actors without preprocessing. The method significantly enhances video quality compared to prevalent methods.
In this paper, we propose a style-based conditional video generative model. We introduce a novel temporal generator based on a set of learned sinusoidal bases. Our method learns dynamic representations of various actions that are independent of image content and can be transferred between different actors. Beyond the significant enhancement of video quality compared to prevalent methods, we demonstrate that the disentangled dynamic and content permit their independent manipulation, as well as temporal GAN-inversion to retrieve and transfer a video motion from one content or identity to another without further preprocessing such as landmark points.