On the Importance of Large Objects in CNN Based Object Detection Algorithms
This addresses performance variability in object detection for computer vision applications, but it is incremental as it modifies an existing training loss.
The paper tackles the problem of uneven performance in object detection due to object sizes by proposing a weighting term in the training loss based on object area, resulting in improved detection scores across all object sizes, such as +2 p.p. mAP on small objects, +2 p.p. on medium, and +4 p.p. on large on COCO val 2017 with InternImage-T.
Object detection models, a prominent class of machine learning algorithms, aim to identify and precisely locate objects in images or videos. However, this task might yield uneven performances sometimes caused by the objects sizes and the quality of the images and labels used for training. In this paper, we highlight the importance of large objects in learning features that are critical for all sizes. Given these findings, we propose to introduce a weighting term into the training loss. This term is a function of the object area size. We show that giving more weight to large objects leads to improved detection scores across all object sizes and so an overall improvement in Object Detectors performances (+2 p.p. of mAP on small objects, +2 p.p. on medium and +4 p.p. on large on COCO val 2017 with InternImage-T). Additional experiments and ablation studies with different models and on a different dataset further confirm the robustness of our findings.