ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction
This work addresses the need for efficient and accurate keypoint detection and descriptor extraction in real-time computer vision applications, representing an incremental improvement over existing methods.
The paper tackles the problem of non-differentiable keypoint detection in computer vision by introducing a partially differentiable module that outputs accurate sub-pixel keypoints, achieving equivalent performance to state-of-the-art methods on tasks like homography estimation while running at 95 frames per second on a GPU.
Existing methods detect the keypoints in a non-differentiable way, therefore they can not directly optimize the position of keypoints through back-propagation. To address this issue, we present a partially differentiable keypoint detection module, which outputs accurate sub-pixel keypoints. The reprojection loss is then proposed to directly optimize these sub-pixel keypoints, and the dispersity peak loss is presented for accurate keypoints regularization. We also extract the descriptors in a sub-pixel way, and they are trained with the stable neural reprojection error loss. Moreover, a lightweight network is designed for keypoint detection and descriptor extraction, which can run at 95 frames per second for 640x480 images on a commercial GPU. On homography estimation, camera pose estimation, and visual (re-)localization tasks, the proposed method achieves equivalent performance with the state-of-the-art approaches, while greatly reduces the inference time.