MotionFlow:Learning Implicit Motion Flow for Complex Camera Trajectory Control in Video Generation
This addresses video generation for applications requiring precise camera control, though it appears incremental as it builds on stable diffusion and image-to-video networks.
The paper tackles the problem of generating videos guided by camera trajectories while maintaining consistency when both camera and object motions are present, by proposing an approach that integrates both motions into pixel-level motion flow, resulting in outperforming state-of-the-art methods by a large margin.
Generating videos guided by camera trajectories poses significant challenges in achieving consistency and generalizability, particularly when both camera and object motions are present. Existing approaches often attempt to learn these motions separately, which may lead to confusion regarding the relative motion between the camera and the objects. To address this challenge, we propose a novel approach that integrates both camera and object motions by converting them into the motion of corresponding pixels. Utilizing a stable diffusion network, we effectively learn reference motion maps in relation to the specified camera trajectory. These maps, along with an extracted semantic object prior, are then fed into an image-to-video network to generate the desired video that can accurately follow the designated camera trajectory while maintaining consistent object motions. Extensive experiments verify that our model outperforms SOTA methods by a large margin.