CVMar 23, 2023

Zero-guidance Segmentation Using Zero Segment Labels

arXiv:2303.13396v322 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the challenge of automated semantic segmentation for applications requiring unsupervised object discovery, though it is incremental as it builds on existing models like CLIP.

The paper tackles the problem of discovering and labeling semantic segments in images without any user guidance, such as text queries or predefined classes, by proposing a zero-guidance segmentation method that leverages pre-trained DINO and CLIP models without fine-tuning, achieving precise results like locating the Mona Lisa in a crowd.

CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a novel problem zero-guidance segmentation and the first baseline that leverages two pre-trained generalist models, DINO and CLIP, to solve this problem without any fine-tuning or segmentation dataset. The general idea is to first segment an image into small over-segments, encode them into CLIP's visual-language space, translate them into text labels, and merge semantically similar segments together. The key challenge, however, is how to encode a visual segment into a segment-specific embedding that balances global and local context information, both useful for recognition. Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. We also introduce several metrics for the evaluation of this new task. With CLIP's innate knowledge, our method can precisely locate the Mona Lisa painting among a museum crowd. Project page: https://zero-guide-seg.github.io/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes