Reuse your features: unifying retrieval and feature-metric alignment
This work addresses visual localization for robotics and AR/VR applications, presenting an incremental improvement by unifying tasks into a single network.
The authors tackled the problem of visual localization by proposing a unified pipeline that reuses deep features across retrieval, re-ranking, and pose estimation, achieving competitive robustness and accuracy with lower computational and memory costs than multi-network approaches.
We propose a compact pipeline to unify all the steps of Visual Localization: image retrieval, candidate re-ranking and initial pose estimation, and camera pose refinement. Our key assumption is that the deep features used for these individual tasks share common characteristics, so we should reuse them in all the procedures of the pipeline. Our DRAN (Deep Retrieval and image Alignment Network) is able to extract global descriptors for efficient image retrieval, use intermediate hierarchical features to re-rank the retrieval list and produce an initial pose guess, which is finally refined by means of a feature-metric optimization based on learned deep multi-scale dense features. DRAN is the first single network able to produce the features for the three steps of visual localization. DRAN achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks, outperforming other unified approaches and consuming lower computational and memory cost than its counterparts using multiple networks. Code and models will be publicly available at https://github.com/jmorlana/DRAN.