GenDeF: Learning Generative Deformation Field for Video Generation
This method addresses video generation challenges for AI and multimedia applications by enabling better quality and easier editing, though it is incremental in its approach.
The authors tackled video generation by proposing to render videos through warping a static image with a generative deformation field, which improved visual quality and temporal consistency, as shown by superior results on three benchmarks.
We offer a new perspective on approaching the task of video generation. Instead of directly synthesizing a sequence of frames, we propose to render a video by warping one static image with a generative deformation field (GenDeF). Such a pipeline enjoys three appealing advantages. First, we can sufficiently reuse a well-trained image generator to synthesize the static image (also called canonical image), alleviating the difficulty in producing a video and thereby resulting in better visual quality. Second, we can easily convert a deformation field to optical flows, making it possible to apply explicit structural regularizations for motion modeling, leading to temporally consistent results. Third, the disentanglement between content and motion allows users to process a synthesized video through processing its corresponding static image without any tuning, facilitating many applications like video editing, keypoint tracking, and video segmentation. Both qualitative and quantitative results on three common video generation benchmarks demonstrate the superiority of our GenDeF method.