Correlation of Object Detection Performance with Visual Saliency and Depth Estimation
This work addresses the problem of optimizing object detection models by analyzing task correlations, offering insights for feature engineering and dataset design, but it is incremental as it builds on existing models and datasets.
This paper investigates correlations between object detection accuracy and visual tasks like depth and saliency prediction, finding that visual saliency has stronger correlations (up to mAρ 0.459 on Pascal VOC) than depth prediction (up to mAρ 0.283), with variations across object categories.
As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA$ρ$ up to 0.459 on Pascal VOC) compared to depth prediction (mA$ρ$ up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.