MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery
This work addresses the problem of small object detection for UAV applications, offering an incremental improvement in accuracy and speed.
The paper tackles small object detection in UAV imagery by introducing MambaRefine-YOLO, which uses a dual-modality fusion module and hierarchical feature aggregation to achieve a state-of-the-art mAP of 83.2% on the DroneVehicle dataset, a 7.9% improvement over the baseline.
Small object detection in Unmanned Aerial Vehicle (UAV) imagery is a persistent challenge, hindered by low resolution and background clutter. While fusing RGB and infrared (IR) data offers a promising solution, existing methods often struggle with the trade-off between effective cross-modal interaction and computational efficiency. In this letter, we introduce MambaRefine-YOLO. Its core contributions are a Dual-Gated Complementary Mamba fusion module (DGC-MFM) that adaptively balances RGB and IR modalities through illumination-aware and difference-aware gating mechanisms, and a Hierarchical Feature Aggregation Neck (HFAN) that uses a ``refine-then-fuse'' strategy to enhance multi-scale features. Our comprehensive experiments validate this dual-pronged approach. On the dual-modality DroneVehicle dataset, the full model achieves a state-of-the-art mAP of 83.2%, an improvement of 7.9% over the baseline. On the single-modality VisDrone dataset, a variant using only the HFAN also shows significant gains, demonstrating its general applicability. Our work presents a superior balance between accuracy and speed, making it highly suitable for real-world UAV applications.