CVMar 12

Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

Shuo Sun, Unal Artan, Malcolm Mielle, Achim J. Lilienthaland, Martin Magnusson

arXiv:2603.12064v18.11 citationsh-index: 26

Predicted impact top 28% in CV · last 90 daysOriginality Highly original

AI Analysis

This addresses the problem of practical 3D reconstruction from multi-view videos for applications like event capture, moving beyond limitations of prior single-camera or rigid-camera methods.

The paper tackles dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras, achieving significant performance improvements over state-of-the-art feed-forward models on synthetic and real-world benchmarks with reduced memory usage.

We address the challenging problem of dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras -- a setting that arises naturally when multiple observers capture a shared event. Prior approaches either handle only single-camera input or require rigidly mounted, pre-calibrated camera rigs, limiting their practical applicability. We propose a two-stage optimization framework that decouples the task into robust camera tracking and dense depth refinement. In the first stage, we extend single-camera visual SLAM to the multi-camera setting by constructing a spatiotemporal connection graph that exploits both intra-camera temporal continuity and inter-camera spatial overlap, enabling consistent scale and robust tracking. To ensure robustness under limited overlap, we introduce a wide-baseline initialization strategy using feed-forward reconstruction models. In the second stage, we refine depth and camera poses by optimizing dense inter- and intra-camera consistency using wide-baseline optical flow. Additionally, we introduce MultiCamRobolab, a new real-world dataset with ground-truth poses from a motion capture system. Finally, we demonstrate that our method significantly outperforms state-of-the-art feed-forward models on both synthetic and real-world benchmarks, while requiring less memory.

View on arXiv PDF

Similar