CVFeb 12, 2025

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

arXiv:2502.08244v234 citationsh-index: 10CVPR
Originality Incremental advance
AI Analysis

This work addresses video synthesis for applications requiring precise camera manipulation, representing an incremental improvement by integrating optical flow into a diffusion model framework.

The paper tackled camera-controllable video generation by introducing FloVD, a video diffusion model that uses optical flow to represent motions, enabling training without ground-truth camera parameters and detailed camera control, resulting in superior performance over previous methods in accurate camera control and natural object motion synthesis.

We present FloVD, a novel video diffusion model for camera-controllable video generation. FloVD leverages optical flow to represent the motions of the camera and moving objects. This approach offers two key benefits. Since optical flow can be directly estimated from videos, our approach allows for the use of arbitrary training videos without ground-truth camera parameters. Moreover, as background optical flow encodes 3D correlation across different viewpoints, our method enables detailed camera control by leveraging the background motion. To synthesize natural object motion while supporting detailed camera control, our framework adopts a two-stage video synthesis pipeline consisting of optical flow generation and flow-conditioned video synthesis. Extensive experiments demonstrate the superiority of our method over previous approaches in terms of accurate camera control and natural object motion synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes