CVApr 2, 2024

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

arXiv:2404.02101v247.8425 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of limited cinematic expressiveness in video generation for users, representing an incremental improvement by adding control to existing models.

The paper tackled the lack of camera pose control in text-to-video generation by introducing CameraCtrl, a plug-and-play module that enables precise camera trajectory control, achieving effective results with various video diffusion models.

Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely. Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for video diffusion models. Our approach explores effective camera trajectory parameterization along with a plug-and-play camera pose control module that is trained on top of a video diffusion model, leaving other modules of the base model untouched. Moreover, a comprehensive study on the effect of various training datasets is conducted, suggesting that videos with diverse camera distributions and similar appearance to the base model indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise camera control with different video generation models, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs.

View on arXiv PDF Code

Similar