SEKD: Self-Evolving Keypoint Detection and Description
This work addresses a bottleneck in computer vision for tasks requiring robust local features, such as image matching and 3D reconstruction, though it appears incremental as it builds on existing DNN-based approaches.
The paper tackles the problem of insufficient interaction between local feature detector and descriptor in deep learning-based local feature models by proposing SEKD, a self-supervised framework that emphasizes repeatability and reliability, and it outperforms existing methods by remarkable margins on tasks like homography estimation.
Researchers have attempted utilizing deep neural network (DNN) to learn novel local features from images inspired by its recent successes on a variety of vision tasks. However, existing DNN-based algorithms have not achieved such remarkable progress that could be partly attributed to insufficient utilization of the interactive characters between local feature detector and descriptor. To alleviate these difficulties, we emphasize two desired properties, i.e., repeatability and reliability, to simultaneously summarize the inherent and interactive characters of local feature detector and descriptor. Guided by these properties, a self-supervised framework, namely self-evolving keypoint detection and description (SEKD), is proposed to learn an advanced local feature model from unlabeled natural images. Additionally, to have performance guarantees, novel training strategies have also been dedicatedly designed to minimize the gap between the learned feature and its properties. We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks. Extensive experimental results demonstrate that the proposed method outperforms popular hand-crafted and DNN-based methods by remarkable margins. Ablation studies also verify the effectiveness of each critical training strategy. We will release our code along with the trained model publicly.