CVDec 4, 2023

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

arXiv:2312.02103v12.82 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses the limitation of indirect supervision and limited transferable concepts in open-vocabulary object detection, which is crucial for achieving human-like visual intelligence.

The paper tackles the problem of open-vocabulary object detection by proposing a method to directly learn region-text alignment for arbitrary concepts, resulting in competitive performance on standard benchmarks and large improvements on referring expression comprehension.

Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes, they suffer from indirect supervision or limited transferable concepts. In this paper, we propose a simple yet effective method to directly learn region-text alignment for arbitrary concepts. Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC). The proposed method shows competitive performance on the standard OVOD benchmark for noun concepts and a large improvement on referring expression comprehension benchmark for arbitrary concepts.

View on arXiv PDF

Similar