CVNov 18, 2021

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

arXiv:2111.09452v3112 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for scalable object detection across diverse categories for applications like robotics and autonomous systems, representing a significant but incremental advance over prior open vocabulary methods.

The paper tackles the problem of limited object categories in detection due to costly bounding-box annotations by proposing a method to automatically generate pseudo bounding-box labels from image-caption pairs, resulting in state-of-the-art performance improvements such as 8% AP on COCO novel categories.

Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on a pre-defined base categories to induce generalization to novel objects. However, their potential is still constrained by the small set of base categories available for training. To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. Our method leverages the localization ability of pre-trained vision-language models to generate pseudo bounding-box labels and then directly uses them for training object detectors. Experimental results show that our method outperforms the state-of-the-art open vocabulary detector by 8% AP on COCO novel categories, by 6.3% AP on PASCAL VOC, by 2.3% AP on Objects365 and by 2.8% AP on LVIS. Code is available at https://github.com/salesforce/PB-OVD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes