CVMar 19, 2025

Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark

arXiv:2503.14862v22 citationsh-index: 1ICRA
Originality Incremental advance
AI Analysis

This addresses evaluation fairness for researchers in computer vision, though it appears incremental as it builds on existing open-vocabulary detection frameworks.

The paper tackles the problem of unreliable evaluation in open-vocabulary object detection by introducing 3F-OVD, a novel task that extends supervised fine-grained object detection to open-vocabulary settings, and creates the NEU-171K dataset for benchmarking, achieving improved performance with a proposed post-processing technique.

Open-vocabulary detectors are proposed to locate and recognize objects in novel classes. However, variations in vision-aware language vocabulary data used for open-vocabulary learning can lead to unfair and unreliable evaluations. Recent evaluation methods have attempted to address this issue by incorporating object properties or adding locations and characteristics to the captions. Nevertheless, since these properties and locations depend on the specific details of the images instead of classes, detectors can not make accurate predictions without precise descriptions provided through human annotation. This paper introduces 3F-OVD, a novel task that extends supervised fine-grained object detection to the open-vocabulary setting. Our task is intuitive and challenging, requiring a deep understanding of Fine-grained captions and careful attention to Fine-grained details in images in order to accurately detect Fine-grained objects. Additionally, due to the scarcity of qualified fine-grained object detection datasets, we have created a new dataset, NEU-171K, tailored for both supervised and open-vocabulary settings. We benchmark state-of-the-art object detectors on our dataset for both settings. Furthermore, we propose a simple yet effective post-processing technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes