CVMar 15

CamLit: Unified Video Diffusion with Explicit Camera and Lighting Control

Zhiyi Kuang, Chengan He, Egor Zakharov, Yuxuan Xue, Shunsuke Saito, Olivier Maury, Timur Bagautdinov, Youyi Zheng, Giljoo Nam

ETH Zurich

arXiv:2603.1424189.5h-index: 13

AI Analysis

This work addresses the need for simplified and integrated control over camera and lighting in video generation for applications like virtual reality or content creation, representing a novel integration rather than an incremental improvement.

CamLit tackles the problem of generating videos with controlled camera and lighting from a single image by introducing a unified diffusion model that jointly performs novel view synthesis and relighting, achieving high-fidelity outputs comparable to state-of-the-art methods in both tasks.

We present CamLit, the first unified video diffusion model that jointly performs novel view synthesis (NVS) and relighting from a single input image. Given one reference image, a user-defined camera trajectory, and an environment map, CamLit synthesizes a video of the scene from new viewpoints under the specified illumination. Within a single generative process, our model produces temporally coherent and spatially aligned outputs, including relit novel-view frames and corresponding albedo frames, enabling high-quality control of both camera pose and lighting. Qualitative and quantitative experiments demonstrate that CamLit achieves high-fidelity outputs on par with state-of-the-art methods in both novel view synthesis and relighting, without sacrificing visual quality in either task. We show that a single generative model can effectively integrate camera and lighting control, simplifying the video generation pipeline while maintaining competitive performance and consistent realism.

View on arXiv PDF

Similar