Scene Coordinate and Correspondence Learning for Image-Based Localization
This work addresses camera pose estimation for computer vision applications, offering a generalized solution with incremental improvements over existing methods.
The paper tackles camera re-localization by proposing a deep learning method that regresses scene coordinates pixel-wise and predicts correspondence confidences, allowing immediate discarding of erroneous predictions and improving initial pose estimates.
Scene coordinate regression has become an essential part of current camera re-localization methods. Different versions, such as regression forests and deep learning methods, have been successfully applied to estimate the corresponding camera pose given a single input image. In this work, we propose to regress the scene coordinates pixel-wise for a given RGB image by using deep learning. Compared to the recent methods, which usually employ RANSAC to obtain a robust pose estimate from the established point correspondences, we propose to regress confidences of these correspondences, which allows us to immediately discard erroneous predictions and improve the initial pose estimates. Finally, the resulting confidences can be used to score initial pose hypothesis and aid in pose refinement, offering a generalized solution to solve this task.