CVAug 25, 2025

SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization

arXiv:2508.17972v111 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses a scalability bottleneck in SfM for computer vision applications, enabling more efficient large-scale 3D reconstruction.

The paper tackles the problem of scaling scene regression methods for Structure-from-Motion (SfM) to handle large numbers of input images by introducing SAIL-Recon, which augments scene regression with visual localization, achieving state-of-the-art results on camera pose estimation and novel view synthesis benchmarks.

Scene regression methods, such as VGGT, solve the Structure-from-Motion (SfM) problem by directly regressing camera poses and 3D scene structures from input images. They demonstrate impressive performance in handling images under extreme viewpoint changes. However, these methods struggle to handle a large number of input images. To address this problem, we introduce SAIL-Recon, a feed-forward Transformer for large scale SfM, by augmenting the scene regression network with visual localization capabilities. Specifically, our method first computes a neural scene representation from a subset of anchor images. The regression network is then fine-tuned to reconstruct all input images conditioned on this neural scene representation. Comprehensive experiments show that our method not only scales efficiently to large-scale scenes, but also achieves state-of-the-art results on both camera pose estimation and novel view synthesis benchmarks, including TUM-RGBD, CO3Dv2, and Tanks & Temples. We will publish our model and code. Code and models are publicly available at: https://hkust-sail.github.io/ sail-recon/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes