CVLGJan 24, 2025

Light3R-SfM: Towards Feed-forward Structure-from-Motion

arXiv:2501.14914v141 citationsh-index: 6CVPR
Originality Highly original
AI Analysis

This work addresses the challenge of costly matching and global optimization in SfM for 3D reconstruction tasks, offering a more scalable and efficient solution for applications in computer vision and robotics.

The paper tackles the problem of efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections by introducing Light3R-SfM, a feed-forward, end-to-end learnable framework that achieves competitive accuracy while significantly reducing runtime, making it suitable for real-world applications with constraints.

We present Light3R-SfM, a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections. Unlike existing SfM solutions that rely on costly matching and global optimization to achieve accurate 3D reconstructions, Light3R-SfM addresses this limitation through a novel latent global alignment module. This module replaces traditional global optimization with a learnable attention mechanism, effectively capturing multi-view constraints across images for robust and precise camera pose estimation. Light3R-SfM constructs a sparse scene graph via retrieval-score-guided shortest path tree to dramatically reduce memory usage and computational overhead compared to the naive approach. Extensive experiments demonstrate that Light3R-SfM achieves competitive accuracy while significantly reducing runtime, making it ideal for 3D reconstruction tasks in real-world applications with a runtime constraint. This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes