CVJul 18, 2023

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

arXiv:2307.09220v298 citationsh-index: 34Has Code
AI Analysis

It provides a comprehensive overview for researchers working on scene understanding tasks, but it is incremental as it synthesizes existing work rather than introducing new methods.

This survey reviews the field of Open-Vocabulary Detection and Segmentation, which addresses the limitation of traditional models that cannot generalize beyond pre-defined categories due to high labeling costs, by organizing methodologies and analyzing key challenges and future directions.

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By ``open-vocabulary'', we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed. In addition, we benchmark each task along with the vital components of each method in appendix and updated online at https://github.com/seanzhuh/awesome-open-vocabulary-detection-and-segmentation. Finally, several promising directions are provided and discussed to stimulate future research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes