CVJul 7, 2019

ELF: Embedded Localisation of Features in pre-trained CNN

arXiv:1907.03261v130 citations
Originality Incremental advance
AI Analysis

This provides a method for feature detection that avoids the need for supervised training, benefiting computer vision applications by leveraging existing models.

The paper tackled the problem of extracting feature locations from pre-trained CNNs without additional training, showing that gradient-based saliency maps can detect keypoints with repeatability and matchability comparable to supervised detectors on standard datasets.

This paper introduces a novel feature detector based only on information embedded inside a CNN trained on standard tasks (e.g. classification). While previous works already show that the features of a trained CNN are suitable descriptors, we show here how to extract the feature locations from the network to build a detector. This information is computed from the gradient of the feature map with respect to the input image. This provides a saliency map with local maxima on relevant keypoint locations. Contrary to recent CNN-based detectors, this method requires neither supervised training nor finetuning. We evaluate how repeatable and how matchable the detected keypoints are with the repeatability and matching scores. Matchability is measured with a simple descriptor introduced for the sake of the evaluation. This novel detector reaches similar performances on the standard evaluation HPatches dataset, as well as comparable robustness against illumination and viewpoint changes on Webcam and photo-tourism images. These results show that a CNN trained on a standard task embeds feature location information that is as relevant as when the CNN is specifically trained for feature detection.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes