CVApr 8, 2025

CamC2V: Context-aware Controllable Video Generation

arXiv:2504.06022v21 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses the limitation of existing image-to-video models in extending beyond provided context for tasks requiring faithful scene representation, offering an incremental improvement in video generation.

The paper tackles the problem of generating coherent and context-aware videos from static images by integrating multiple image conditions with 3D constraints and camera control, resulting in improved visual quality and camera controllability as demonstrated on the RealEstate10K dataset.

Recently, image-to-video (I2V) diffusion models have demonstrated impressive scene understanding and generative quality, incorporating image conditions to guide generation. However, these models primarily animate static images without extending beyond their provided context. Introducing additional constraints, such as camera trajectories, can enhance diversity but often degrade visual quality, limiting their applicability for tasks requiring faithful scene representation. We propose CamC2V, a context-to-video (C2V) model that integrates multiple image conditions as context with 3D constraints alongside camera control to enrich both global semantics and fine-grained visual details. This enables more coherent and context-aware video generation. Moreover, we motivate the necessity of temporal awareness for an effective context representation. Our comprehensive study on the RealEstate10K dataset demonstrates improvements in visual quality and camera controllability. We will publish our code upon acceptance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes