CEMay 29

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

arXiv:2605.3089543.5h-index: 29
Predicted impact top 30% in CE · last 90 daysOriginality Incremental advance
AI Analysis

This work is significant for researchers and practitioners in computer vision and graphics who are working on generating realistic and geometrically consistent videos from sparse input conditions, offering an incremental improvement over existing methods.

This paper addresses the challenge of sparse camera-conditioned image-to-video generation, which aims to synthesize geometrically consistent 3D motion from minimal pose cues. The authors introduce CamGeo, a framework that distills 3D geometric knowledge from a pre-trained video-to-3D model into a diffusion backbone, resulting in consistent improvements across various sparsity ratios.

Sparse camera-conditioned image-to-video generation presents a pivotal challenge: synthesizing geometrically consistent 3D motion from minimal pose cues. Existing methods, which largely rely on dense supervision or naive interpolation, suffer from severe pose drift and motion discontinuities due to the lack of robust 3D priors. In this paper, we introduce CamGeo, a novel framework that distills rich 3D geometric knowledge from a pre-trained video-to-3D model (VGGT) directly into the diffusion backbone. To achieve this without incurring inference latency, we propose a training-only distillation strategy. Specifically, CamGeo incorporates: (1) keyframe trajectory distillation that enforces cycle-consistency with sparse input poses, (2) cross-frame consistency distillation with both camera trajectory and depth constraints to generate consistent structure across unsupervised frames, and (3) a three-stage coarse-to-fine curriculum learning, progressively scales geometric complexity, from global structure coherence to fine-grained refinement, achieving stable optimization. Extensive experiments demonstrate that CamGeo achieves consistent improvements under various sparsity ratios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes