CVMar 1, 2021

P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching

arXiv:2103.01055v277 citations
AI Analysis

This addresses the challenge of directly matching pixels and points for applications like visual localization, though it is incremental as it builds on existing learning-based descriptors and detectors.

The paper tackles the problem of establishing fine-grained correspondences between 2D images and 3D point clouds by proposing a dual fully convolutional framework that jointly describes and detects keypoints in a shared latent space, achieving state-of-the-art results for indoor visual localization.

Accurately describing and detecting 2D and 3D keypoints is crucial to establishing correspondences across images and point clouds. Despite a plethora of learning-based 2D or 3D local feature descriptors and detectors having been proposed, the derivation of a shared descriptor and joint keypoint detector that directly matches pixels and points remains under-explored by the community. This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds. In order to directly match pixels and points, a dual fully convolutional framework is presented that maps 2D and 3D inputs into a shared latent representation space to simultaneously describe and detect keypoints. Furthermore, an ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions. Extensive experimental results demonstrate that our framework shows competitive performance in fine-grained matching between images and point clouds and achieves state-of-the-art results for the task of indoor visual localization. Our source code will be available at [no-name-for-blind-review].

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes