D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
This addresses the challenge of robust image matching and localization for computer vision applications, representing a novel hybrid approach rather than a foundational breakthrough.
The paper tackles the problem of finding reliable pixel-level correspondences under difficult imaging conditions by proposing D2-Net, a single CNN that jointly performs dense feature description and detection, resulting in state-of-the-art performance on the Aachen Day-Night and InLoc localization benchmarks.
In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.