CVDec 20, 2019

DeepSFM: Structure From Motion Via Deep Bundle Adjustment

arXiv:1912.09697v2131 citations
AI Analysis

This addresses the SfM problem for computer vision applications by providing a more practical deep learning solution that does not require accurate pose inputs, though it builds incrementally on existing cost volume methods.

The paper tackles the problem of structure from motion (SfM) in computer vision by proposing DeepSFM, a deep learning model inspired by traditional bundle adjustment that jointly estimates depth and pose without relying on accurate camera poses. It achieves state-of-the-art performance on depth and pose estimation with improved robustness against limited inputs and noisy initialization.

Structure from motion (SfM) is an essential computer vision problem which has not been well handled by deep learning. One of the promising trends is to apply explicit structural constraint, e.g. 3D cost volume, into the network. However, existing methods usually assume accurate camera poses either from GT or other methods, which is unrealistic in practice. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both. The explicit constraints on both depth (structure) and pose (motion), when combined with the learning components, bring the merit from both traditional BA and emerging deep learning technology. Extensive experiments on various datasets show that our model achieves the state-of-the-art performance on both depth and pose estimation with superior robustness against less number of inputs and the noise in initialization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes