Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection
This addresses misalignment issues in cross-modal object detection for applications like surveillance or autonomous systems, representing an incremental improvement with novel method elements.
The paper tackled the problem of visible-infrared object detection being limited by misalignments due to resolution disparities and modality inconsistencies, proposing WMNet which achieved state-of-the-art performance on datasets like DVTOD, DroneVehicle, and M3FD.
Visible-infrared object detection aims to enhance the detection robustness by exploiting the complementary information of visible and infrared image pairs. However, its performance is often limited by frequent misalignments caused by resolution disparities, spatial displacements, and modality inconsistencies. To address this issue, we propose the Wavelet-guided Misalignment-aware Network (WMNet), a unified framework designed to adaptively address different cross-modal misalignment patterns. WMNet incorporates wavelet-based multi-frequency analysis and modality-aware fusion mechanisms to improve the alignment and integration of cross-modal features. By jointly exploiting low and high-frequency information and introducing adaptive guidance across modalities, WMNet alleviates the adverse effects of noise, illumination variation, and spatial misalignment. Furthermore, it enhances the representation of salient target features while suppressing spurious or misleading information, thereby promoting more accurate and robust detection. Extensive evaluations on the DVTOD, DroneVehicle, and M3FD datasets demonstrate that WMNet achieves state-of-the-art performance on misaligned cross-modal object detection tasks, confirming its effectiveness and practical applicability.