CVJul 30, 2025

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques

arXiv:2507.22791v14 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers and practitioners in computer vision, but it is incremental as a review paper.

This survey reviews feature matching techniques across various modalities, highlighting that traditional methods like SIFT struggle with modality gaps, while deep learning approaches such as SuperPoint and LoFTR improve robustness and adaptability.

Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes