CVApr 24

Long-tail Internet photo reconstruction

arXiv:2604.2271461.53 citations
AI Analysis

For the field of 3D reconstruction, this work tackles the long-tail distribution of Internet photo collections, enabling reliable 3D models for the majority of real-world sites that are sparsely photographed.

The paper addresses the challenge of 3D reconstruction from sparse, noisy Internet photos of long-tail scenes. By simulating sparse training data from well-reconstructed landmarks and introducing the MegaDepth-X dataset, they achieve robust reconstructions under extreme sparsity and improve performance on symmetric/repetitive scenes while maintaining generalization to dense benchmarks.

Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields robust reconstructions under extreme sparsity, and also enables more reliable reconstruction in symmetric and repetitive scenes, while preserving generalization to standard, dense 3D benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes