CVDec 7, 2023

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

arXiv:2312.04539v37.612 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for human-specified vocabularies in open-ended image understanding, offering an incremental improvement by automating category identification.

The paper tackles the problem of open-vocabulary semantic segmentation requiring predefined object categories by introducing Auto-Vocabulary Semantic Segmentation (AVS), which autonomously identifies and segments relevant classes, achieving new benchmarks on datasets like PASCAL VOC and Cityscapes.

Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, without training or fine-tuning. However, OVS methods typically require a human in the loop to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce Auto-Vocabulary Semantic Segmentation (AVS), advancing open-ended image understanding by eliminating the necessity to predefine object categories for segmentation. Our approach, AutoSeg, presents a framework that autonomously identifies relevant class names using semantically enhanced BLIP embeddings and segments them afterwards. Given that open-ended object category predictions cannot be directly compared with a fixed ground truth, we develop a Large Language Model-based Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically generated classes and their corresponding segments. With AVS, our method sets new benchmarks on datasets PASCAL VOC, Context, ADE20K, and Cityscapes, while showing competitive performance to OVS methods that require specified class names.

View on arXiv PDF Code

Similar