YOLO-PRO: Enhancing Instance-Specific Object Detection with Full-Channel Global Self-Attention
This work addresses the problem of improving object detection accuracy and efficiency for deployment on edge devices, representing an incremental advancement over existing YOLO frameworks.
The paper tackles limitations in object detection frameworks, such as diminished instance discriminability and computational redundancy, by proposing two novel modules: the Instance-Specific Bottleneck with full-channel global self-attention and the Instance-Specific Asymmetric Decoupled Head. The result is YOLO-PRO, which achieves state-of-the-art performance on MS-COCO, surpassing YOLOv8 by 1.0-1.6% AP and YOLO11 by 0.1-0.5% AP across various scales.
This paper addresses the inherent limitations of conventional bottleneck structures (diminished instance discriminability due to overemphasis on batch statistics) and decoupled heads (computational redundancy) in object detection frameworks by proposing two novel modules: the Instance-Specific Bottleneck with full-channel global self-attention (ISB) and the Instance-Specific Asymmetric Decoupled Head (ISADH). The ISB module innovatively reconstructs feature maps to establish an efficient full-channel global attention mechanism through synergistic fusion of batch-statistical and instance-specific features. Complementing this, the ISADH module pioneers an asymmetric decoupled architecture enabling hierarchical multi-dimensional feature integration via dual-stream batch-instance representation fusion. Extensive experiments on the MS-COCO benchmark demonstrate that the coordinated deployment of ISB and ISADH in the YOLO-PRO framework achieves state-of-the-art performance across all computational scales. Specifically, YOLO-PRO surpasses YOLOv8 by 1.0-1.6% AP (N/S/M/L/X scales) and outperforms YOLO11 by 0.1-0.5% AP in critical N/M/L/X groups, while maintaining competitive computational efficiency. This work provides practical insights for developing high-precision detectors deployable on edge devices.