CVNov 4, 2020

Realtime CNN-based Keypoint Detector with Sobel Filter and CNN-based Descriptor Trained with Keypoint Candidates

arXiv:2011.02119v13 citations
AI Analysis

This work addresses the need for efficient local feature extraction in applications like SLAM and 3D reconstruction, but it is incremental as it builds on existing CNN-based approaches with specific optimizations.

The paper tackles real-time keypoint detection and description for computer vision tasks by introducing lightweight CNNs, SobelNet and DesNet, achieving inference times of 7.59ms and 1.09ms respectively on an RTX 2070 SUPER for 640x480 images, with performance comparable to state-of-the-art methods on benchmarks like HPatches.

The local feature detector and descriptor are essential in many computer vision tasks, such as SLAM and 3D reconstruction. In this paper, we introduce two separate CNNs, lightweight SobelNet and DesNet, to detect key points and to compute dense local descriptors. The detector and the descriptor work in parallel. Sobel filter provides the edge structure of the input images as the input of CNN. The locations of key points will be obtained after exerting the non-maximum suppression (NMS) process on the output map of the CNN. We design Gaussian loss for the training process of SobelNet to detect corner points as keypoints. At the same time, the input of DesNet is the original grayscale image, and circle loss is used to train DesNet. Besides, output maps of SobelNet are needed while training DesNet. We have evaluated our method on several benchmarks including HPatches benchmark, ETH benchmark, and FM-Bench. SobelNet achieves better or comparable performance with less computation compared with SOTA methods in recent years. The inference time of an image of 640x480 is 7.59ms and 1.09ms for SobelNet and DesNet respectively on RTX 2070 SUPER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes