Distributed Global Structure-from-Motion with a Deep Front-End
This work addresses the challenge of enhancing global SfM for computer vision applications, but it is incremental as it shows limited gains over existing methods.
The authors tackled the problem of improving global Structure-from-Motion (SfM) to match state-of-the-art incremental methods by integrating deep-learning-based feature extraction and matching, but found that none outperformed classical SIFT features in incremental SfM results across datasets.
While initial approaches to Structure-from-Motion (SfM) revolved around both global and incremental methods, most recent applications rely on incremental systems to estimate camera poses due to their superior robustness. Though there has been tremendous progress in SfM `front-ends' powered by deep models learned from data, the state-of-the-art (incremental) SfM pipelines still rely on classical SIFT features, developed in 2004. In this work, we investigate whether leveraging the developments in feature extraction and matching helps global SfM perform on par with the SOTA incremental SfM approach (COLMAP). To do so, we design a modular SfM framework that allows us to easily combine developments in different stages of the SfM pipeline. Our experiments show that while developments in deep-learning based two-view correspondence estimation do translate to improvements in point density for scenes reconstructed with global SfM, none of them outperform SIFT when comparing with incremental SfM results on a range of datasets. Our SfM system is designed from the ground up to leverage distributed computation, enabling us to parallelize computation on multiple machines and scale to large scenes.