ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
This addresses a limitation in geometric matching applications where traditional local feature descriptors ignore spatial relationships between keypoints.
The paper tackles the problem of local feature descriptors lacking spatial context by introducing a unified learning framework that augments existing descriptors with visual and geometric context from keypoint distributions. The method achieves remarkable improvements on multiple large-scale benchmarks with diversified scenes while maintaining lightweight computational requirements.
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.