CV LGJan 24, 2025

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Sérgio Agostinho, Laura Leal-Taixé

arXiv:2501.14914v131.141 citationsh-index: 6CVPR

Originality Highly original

AI Analysis

This work addresses the challenge of costly matching and global optimization in SfM for 3D reconstruction tasks, offering a more scalable and efficient solution for applications in computer vision and robotics.

The paper tackles the problem of efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections by introducing Light3R-SfM, a feed-forward, end-to-end learnable framework that achieves competitive accuracy while significantly reducing runtime, making it suitable for real-world applications with constraints.

We present Light3R-SfM, a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections. Unlike existing SfM solutions that rely on costly matching and global optimization to achieve accurate 3D reconstructions, Light3R-SfM addresses this limitation through a novel latent global alignment module. This module replaces traditional global optimization with a learnable attention mechanism, effectively capturing multi-view constraints across images for robust and precise camera pose estimation. Light3R-SfM constructs a sparse scene graph via retrieval-score-guided shortest path tree to dramatically reduce memory usage and computational overhead compared to the naive approach. Extensive experiments demonstrate that Light3R-SfM achieves competitive accuracy while significantly reducing runtime, making it ideal for 3D reconstruction tasks in real-world applications with a runtime constraint. This work pioneers a data-driven, feed-forward SfM approach, paving the way toward scalable, accurate, and efficient 3D reconstruction in the wild.

View on arXiv PDF

Similar