CVMar 21, 2024

Explorative Inbetweening of Time and Space

Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang

arXiv:2403.14611v121.522 citationsh-index: 11ECCV

Originality Incremental advance

AI Analysis

This addresses the challenge of controllable video generation for applications like animation and scene synthesis, though it is incremental as it builds on existing image-to-video models.

The paper tackles the problem of generating videos with controlled camera and subject motion using only start and end frames, achieving this by introducing Time Reversal Fusion, a sampling strategy that fuses forward and backward denoising paths without additional training, resulting in outperformance on all subtasks compared to existing methods.

We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strategy, which we call Time Reversal Fusion, that fuses the temporally forward and backward denoising paths conditioned on the start and end frame, respectively. The fused path results in a video that smoothly connects the two frames, generating inbetweening of faithful subject motion, novel views of static scenes, and seamless video looping when the two bounding frames are identical. We curate a diverse evaluation dataset of image pairs and compare against the closest existing methods. We find that Time Reversal Fusion outperforms related work on all subtasks, exhibiting the ability to generate complex motions and 3D-consistent views guided by bounded frames. See project page at https://time-reversal.github.io.

View on arXiv PDF

Similar