CV AIAug 13, 2025

COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection

Peiran Peng, Tingfa Xu, Liqiang Song, Mengqi Zhu, Yuqiang Fang, Jianan Li

arXiv:2508.09533v13 citationsh-index: 21IEEE transactions on circuits and systems for video technology (Print)

Originality Highly original

AI Analysis

This work addresses the critical challenge of tiny object detection in RGBT imagery for applications like surveillance and autonomous navigation, representing an incremental advance with specific gains.

The paper tackled the problem of detecting tiny objects in multimodal RGBT imagery, particularly in drone-based scenarios with challenges like spatial misalignment and low-light conditions, and achieved a 3.32% mAP50 improvement on the RGBTDronePerson dataset over state-of-the-art methods.

Detecting tiny objects in multimodal Red-Green-Blue-Thermal (RGBT) imagery is a critical challenge in computer vision, particularly in surveillance, search and rescue, and autonomous navigation. Drone-based scenarios exacerbate these challenges due to spatial misalignment, low-light conditions, occlusion, and cluttered backgrounds. Current methods struggle to leverage the complementary information between visible and thermal modalities effectively. We propose COXNet, a novel framework for RGBT tiny object detection, addressing these issues through three core innovations: i) the Cross-Layer Fusion Module, fusing high-level visible and low-level thermal features for enhanced semantic and spatial accuracy; ii) the Dynamic Alignment and Scale Refinement module, correcting cross-modal spatial misalignments and preserving multi-scale features; and iii) an optimized label assignment strategy using the GeoShape Similarity Measure for better localization. COXNet achieves a 3.32\% mAP$_{50}$ improvement on the RGBTDronePerson dataset over state-of-the-art methods, demonstrating its effectiveness for robust detection in complex environments.

View on arXiv PDF

Similar