CVApr 1, 2024

Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

arXiv:2405.14874v47 citationsh-index: 22ECCV Workshops
Originality Synthesis-oriented
AI Analysis

This addresses the critical problem of out-of-distribution robustness for deploying trustworthy vision models, but it is incremental as it focuses on evaluating existing models rather than proposing new solutions.

The study evaluated the robustness of three open-vocabulary object detection models (OWL-ViT, YOLO World, Grounding DINO) under distribution shifts, finding significant challenges in zero-shot capabilities across benchmarks like COCO-O, COCO-DC, and COCO-C.

The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: https://prakashchhipa.github.io/projects/ovod_robustness

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes