Fine-Grained Dynamic Head for Object Detection
This work aims to improve object detection performance by better handling scale variance, which is a common challenge for computer vision researchers and practitioners.
This paper addresses the issue of scale variance in object detection by proposing a fine-grained dynamic head that selects pixel-level combinations of FPN features for each instance. This approach enhances multi-scale feature representation and incorporates a spatial gate with a new activation function to reduce computational complexity via spatially sparse convolutions.
The Feature Pyramid Network (FPN) presents a remarkable approach to alleviate the scale variance in object representation by performing instance-level assignments. Nevertheless, this strategy ignores the distinct characteristics of different sub-regions in an instance. To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation. Moreover, we design a spatial gate with the new activation function to reduce computational complexity dramatically through spatially sparse convolutions. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks. Code is available at https://github.com/StevenGrove/DynamicHead.