Structure-from-Motion using Dense CNN Features with Keypoint Relocalization
This work addresses the problem of accurate 3D reconstruction from images with severe appearance variations for computer vision applications, representing an incremental improvement by enhancing existing dense feature methods with relocalization.
The paper tackles the challenge of Structure from Motion (SfM) under extreme appearance changes by proposing a pipeline that uses dense CNN features with keypoint relocalization to achieve pixel-level accuracy, demonstrating on the Aachen Day-Night dataset that it outperforms the state-of-the-art SfM (COLMAP using RootSIFT) by a large margin.
Structure from Motion (SfM) using imagery that involves extreme appearance changes is yet a challenging task due to a loss of feature repeatability. Using feature correspondences obtained by matching densely extracted convolutional neural network (CNN) features significantly improves the SfM reconstruction capability. However, the reconstruction accuracy is limited by the spatial resolution of the extracted CNN features which is not even pixel-level accuracy in the existing approach. Providing dense feature matches with precise keypoint positions is not trivial because of memory limitation and computational burden of dense features. To achieve accurate SfM reconstruction with highly repeatable dense features, we propose an SfM pipeline that uses dense CNN features with relocalization of keypoint position that can efficiently and accurately provide pixel-level feature correspondences. Then, we demonstrate on the Aachen Day-Night dataset that the proposed SfM using dense CNN features with the keypoint relocalization outperforms a state-of-the-art SfM (COLMAP using RootSIFT) by a large margin.