CVApr 15, 2024

XoFTR: Cross-modal Feature Matching Transformer

Önder Tuzcuoğlu, Aybora Köksal, Buğra Sofu, Sinan Kalkan, A. Aydın Alatan

arXiv:2404.09692v122.774 citationsh-index: 13Has Code2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Originality Incremental advance

AI Analysis

This addresses a domain-specific problem for applications like surveillance or autonomous systems in adverse conditions, but it is incremental as it builds on existing cross-modal matching techniques.

The paper tackles the problem of local feature matching between thermal infrared and visible images, which is challenging due to texture and intensity differences, and shows that their method outperforms existing methods on benchmarks.

We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.

View on arXiv PDF Code

Similar