CV AISep 22, 2025

An Empirical Study on the Robustness of YOLO Models for Underwater Object Detection

Edwine Nabahirwa, Wei Song, Minghua Zhang, Shufan Chen

arXiv:2509.17561v13.62 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of unreliable object detection in underwater environments for computer vision applications, representing an incremental evaluation of existing methods on new data.

This study systematically evaluated YOLO models for underwater object detection, finding that YOLOv12 performs best overall but is vulnerable to noise, and that noise disrupts key features like edges and textures. The research also identified class imbalance as a persistent challenge and tested lightweight training strategies that show potential for improving robustness.

Underwater object detection (UOD) remains a critical challenge in computer vision due to underwater distortions which degrade low-level features and compromise the reliability of even state-of-the-art detectors. While YOLO models have become the backbone of real-time object detection, little work has systematically examined their robustness under these uniquely challenging conditions. This raises a critical question: Are YOLO models genuinely robust when operating under the chaotic and unpredictable conditions of underwater environments? In this study, we present one of the first comprehensive evaluations of recent YOLO variants (YOLOv8-YOLOv12) across six simulated underwater environments. Using a unified dataset of 10,000 annotated images from DUO and Roboflow100, we not only benchmark model robustness but also analyze how distortions affect key low-level features such as texture, edges, and color. Our findings show that (1) YOLOv12 delivers the strongest overall performance but is highly vulnerable to noise, and (2) noise disrupts edge and texture features, explaining the poor detection performance in noisy images. Class imbalance is a persistent challenge in UOD. Experiments revealed that (3) image counts and instance frequency primarily drive detection performance, while object appearance exerts only a secondary influence. Finally, we evaluated lightweight training-aware strategies: noise-aware sample injection, which improves robustness in both noisy and real-world conditions, and fine-tuning with advanced enhancement, which boosts accuracy in enhanced domains but slightly lowers performance in original data, demonstrating strong potential for domain adaptation, respectively. Together, these insights provide practical guidance for building resilient and cost-efficient UOD systems.

View on arXiv PDF

Similar