Measuring the Impact of Rotation Equivariance on Aerial Object Detection
This work addresses the need for efficient and accurate object detection in aerial imagery, which is crucial for applications like surveillance and mapping, by introducing a novel detector that combines strict rotation equivariance with parameter reduction.
The paper tackles the problem of arbitrary object orientation in aerial images by implementing a strictly rotation-equivariant backbone and neck network, and proposes a multi-branch head network to reduce parameters while improving accuracy, resulting in state-of-the-art performance on DOTA-v1.0, DOTA-v1.5, and DIOR-R datasets with low parameter count.
Due to the arbitrary orientation of objects in aerial images, rotation equivariance is a critical property for aerial object detectors. However, recent studies on rotation-equivariant aerial object detection remain scarce. Most detectors rely on data augmentation to enable models to learn approximately rotation-equivariant features. A few detectors have constructed rotation-equivariant networks, but due to the breaking of strict rotation equivariance by typical downsampling processes, these networks only achieve approximately rotation-equivariant backbones. Whether strict rotation equivariance is necessary for aerial image object detection remains an open question. In this paper, we implement a strictly rotation-equivariant backbone and neck network with a more advanced network structure and compare it with approximately rotation-equivariant networks to quantitatively measure the impact of rotation equivariance on the performance of aerial image detectors. Additionally, leveraging the inherently grouped nature of rotation-equivariant features, we propose a multi-branch head network that reduces the parameter count while improving detection accuracy. Based on the aforementioned improvements, this study proposes the Multi-branch head rotation-equivariant single-stage Detector (MessDet), which achieves state-of-the-art performance on the challenging aerial image datasets DOTA-v1.0, DOTA-v1.5 and DIOR-R with an exceptionally low parameter count.