SiLK -- Simple Learned Keypoints
This work addresses the challenge of inconsistent and hard-to-interpret results in learned keypoint detectors for computer vision tasks like image matching and 3D reconstruction, offering a more interpretable and high-performing solution.
The authors tackled the problem of designing a simple and effective learned keypoint detector by deconstructing existing methods and redesigning components from first principles, resulting in SiLK, which achieves state-of-the-art performance on tasks like Detection Repeatability and Homography Estimation on HPatches and 3D Point-Cloud Registration on ScanNet, with competitive results on camera pose estimation.
Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry. Hand-engineered methods like Harris corners, SIFT, and HOG descriptors have been used for decades; more recently, there has been a trend to introduce learning in an attempt to improve keypoint detectors. On inspection however, the results are difficult to interpret; recent learning-based methods employ a vast diversity of experimental setups and design choices: empirical results are often reported using different backbones, protocols, datasets, types of supervisions or tasks. Since these differences are often coupled together, it raises a natural question on what makes a good learned keypoint detector. In this work, we revisit the design of existing keypoint detectors by deconstructing their methodologies and identifying the key components. We re-design each component from first-principle and propose Simple Learned Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible. Despite its simplicity, SiLK advances new state-of-the-art on Detection Repeatability and Homography Estimation tasks on HPatches and 3D Point-Cloud Registration task on ScanNet, and achieves competitive performance to state-of-the-art on camera pose estimation in 2022 Image Matching Challenge and ScanNet.