CVJul 9, 2019

UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor

arXiv:1907.04011v194 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating reliable interest point detectors for computer vision applications without human annotation, though it is incremental as it builds on existing unsupervised and deep learning methods.

The paper tackles the problem of building interest point detectors without consistent ground truth data by introducing UnsuperPoint, an unsupervised deep learning-based detector and descriptor that uses a self-supervised approach with a novel loss function, achieving real-time performance of up to 323 fps and comparable or better state-of-the-art results on the HPatch dataset.

It is hard to create consistent ground truth data for interest points in natural images, since interest points are hard to define clearly and consistently for a human annotator. This makes interest point detectors non-trivial to build. In this work, we introduce an unsupervised deep learning-based interest point detector and descriptor. Using a self-supervised approach, we utilize a siamese network and a novel loss function that enables interest point scores and positions to be learned automatically. The resulting interest point detector and descriptor is UnsuperPoint. We use regression of point positions to 1) make UnsuperPoint end-to-end trainable and 2) to incorporate non-maximum suppression in the model. Unlike most trainable detectors, it requires no generation of pseudo ground truth points, no structure-from-motion-generated representations and the model is learned from only one round of training. Furthermore, we introduce a novel loss function to regularize network predictions to be uniformly distributed. UnsuperPoint runs in real-time with 323 frames per second (fps) at a resolution of $224\times320$ and 90 fps at $480\times640$. It is comparable or better than state-of-the-art performance when measured for speed, repeatability, localization, matching score and homography estimation on the HPatch dataset.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes