CVDec 28, 2025

Evaluating the Performance of Open-Vocabulary Object Detection in Low-quality Image

arXiv:2512.22801v2h-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of robust object detection in real-world low-quality images for computer vision applications, but it is incremental as it focuses on evaluation rather than new methods.

The study evaluated open-vocabulary object detection models on low-quality images, finding that performance dropped sharply under high-level degradation, with OWLv2 models performing best and others declining significantly.

Open-vocabulary object detection enables models to localize and recognize objects beyond a predefined set of categories and is expected to achieve recognition capabilities comparable to human performance. In this study, we aim to evaluate the performance of existing models on open-vocabulary object detection tasks under low-quality image conditions. For this purpose, we introduce a new dataset that simulates low-quality images in the real world. In our evaluation experiment, we find that although open-vocabulary object detection models exhibited no significant decrease in mAP scores under low-level image degradation, the performance of all models dropped sharply under high-level image degradation. OWLv2 models consistently performed better across different types of degradation, while OWL-ViT, GroundingDINO, and Detic showed significant performance declines. We will release our dataset and codes to facilitate future studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes