CVMMJun 28, 2023

$\mathbf{C}^2$Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection

arXiv:2306.16175v3184 citationsh-index: 28Has Code
Originality Incremental advance
AI Analysis

This work addresses robust object detection for around-the-clock applications like surveillance by improving fusion of RGB and infrared modalities, but it is incremental as it builds on existing transformer and detector frameworks.

The paper tackles modality miscalibration and fusion imprecision in RGB-infrared object detection by proposing C^2Former, a transformer-based method that uses inter-modality cross-attention to achieve calibrated and complementary features, resulting in robust detection validated on DroneVehicle and KAIST datasets.

Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes