CVMar 22, 2021

Optimization for Arbitrary-Oriented Object Detection via Representation Invariance Loss

Qi Ming, Lingjuan Miao, Zhiqiang Zhou, Xue Yang, Yunpeng Dong

arXiv:2103.11636v315.1104 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in oriented object detection for applications like remote sensing and scene text analysis, offering an incremental improvement over existing methods.

The paper tackles the problem of representation ambiguity in arbitrary-oriented object detection, which causes suboptimal regression and inconsistency between loss and localization accuracy, by proposing a Representation Invariance Loss (RIL) that treats multiple object representations as equivalent local minima and uses Hungarian matching for optimal regression, achieving consistent and substantial improvement on remote sensing and scene text datasets.

Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years. The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects. However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions. In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima. Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy. We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation. Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement. The source code and trained models are available at https://github.com/ming71/RIDet.

View on arXiv PDF Code

Similar