CVMar 10

ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

arXiv:2603.09819v135.31 citationsh-index: 15
Predicted impact top 14% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of generating accurate novel views from limited inputs for applications in computer vision and graphics, representing an incremental improvement over existing camera-guided diffusion models.

The paper tackles novel view synthesis from two input images under large viewpoint changes by proposing ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions, resulting in geometrically consistent and visually plausible novel views with effective reconstruction of occluded regions.

We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate from intended trajectories due to noisy point cloud projections or insufficient conditioning from camera poses. To address these issues, we propose ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions. ConfCtrl initializes the diffusion process by combining a confidence-weighted projected point cloud latent with noise as the conditioning input. It then applies a Kalman-inspired predict-update mechanism, treating the projected point cloud as a noisy measurement and using learned residual corrections to balance pose-driven predictions with noisy geometric observations. This allows the model to rely on reliable projections while down-weighting uncertain regions, yielding stable, geometry-aware generation. Experiments on multiple datasets show that ConfCtrl produces geometrically consistent and visually plausible novel views, effectively reconstructing occluded regions under large viewpoint changes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes