CVMay 26, 2020

Learning Local Features with Context Aggregation for Visual Localization

arXiv:2005.12880v22 citations
AI Analysis

This addresses the challenge of robust keypoint detection and description in vision applications, though it appears incremental as it builds on existing detect-then-describe strategies.

The paper tackles the problem of learning robust local features for visual localization by fusing low-level textual information with high-level semantic context, achieving state-of-the-art performance on a challenging benchmark dataset.

Keypoint detection and description is fundamental yet important in many vision applications. Most existing methods use detect-then-describe or detect-and-describe strategy to learn local features without considering their context information. Consequently, it is challenging for these methods to learn robust local features. In this paper, we focus on the fusion of low-level textual information and high-level semantic context information to improve the discrimitiveness of local features. Specifically, we first estimate a score map to represent the distribution of potential keypoints according to the quality of descriptors of all pixels. Then, we extract and aggregate multi-scale high-level semantic features based by the guidance of the score map. Finally, the low-level local features and high-level semantic features are fused and refined using a residual module. Experiments on the challenging local feature benchmark dataset demonstrate that our method achieves the state-of-the-art performance in the local feature challenge of the visual localization benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes