CVDec 13, 2021

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

arXiv:2112.06701v155 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of robust object detection in aerial images for applications like surveillance or mapping, but it is incremental as it builds on existing anchor-based and anchor-free methods.

The paper tackles the challenge of small object detection with appearance degradation in aerial images by proposing a Dynamic Enhancement Anchor (DEA) network, which achieves state-of-the-art performance with improvements such as 0.40% mAP on DOTA for oriented detection and 80.37% mAP with ReDet baseline.

Object detection has made tremendous strides in computer vision. Small object detection with appearance degradation is a prominent challenge, especially for aerial observations. To collect sufficient positive/negative samples for heuristic training, most object detectors preset region anchors in order to calculate Intersection-over-Union (IoU) against the ground-truthed data. In this case, small objects are frequently abandoned or mislabeled. In this paper, we present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator. Different from the other state-of-the-art techniques, the proposed network leverages a sample discriminator to realize interactive sample screening between an anchor-based unit and an anchor-free unit to generate eligible samples. Besides, multi-task joint training with a conservative anchor-based inference scheme enhances the performance of the proposed model while reducing computational complexity. The proposed scheme supports both oriented and horizontal object detection tasks. Extensive experiments on two challenging aerial benchmarks (i.e., DOTA and HRSC2016) indicate that our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training. On DOTA, our DEA-Net which integrated with the baseline of RoI-Transformer surpasses the advanced method by 0.40% mean-Average-Precision (mAP) for oriented object detection with a weaker backbone network (ResNet-101 vs ResNet-152) and 3.08% mean-Average-Precision (mAP) for horizontal object detection with the same backbone. Besides, our DEA-Net which integrated with the baseline of ReDet achieves the state-of-the-art performance by 80.37%. On HRSC2016, it surpasses the previous best model by 1.1% using only 3 horizontal anchors.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes