CVApr 12, 2024

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

arXiv:2404.08181v274 citationsh-index: 50Has CodeWACV
AI Analysis

It addresses the need for flexible semantic segmentation without fixed class sets, offering a practical solution without additional data or pre-training, though it is incremental as it builds on existing CLIP models.

The paper tackles the problem of open-vocabulary semantic segmentation by proposing a training-free method that adapts CLIP to enforce patch localization in self-attention, achieving state-of-the-art performance on most of 8 benchmarks.

Despite the significant progress in deep learning for dense visual recognition problems, such as semantic segmentation, traditional methods are constrained by fixed class sets. Meanwhile, vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks, owing to their robust generalizability. Recently, a body of work has investigated utilizing these models in open-vocabulary semantic segmentation (OVSS). However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks. In this work, we propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP), representing a straightforward adaptation of CLIP tailored for this scenario. Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature. By incorporating design choices favouring segmentation, our approach significantly improves performance without requiring additional data, auxiliary pre-trained networks, or extensive hyperparameter tuning, making it highly practical for real-world applications. Experiments are performed on 8 popular semantic segmentation benchmarks, yielding state-of-the-art performance on most scenarios. Our code is publicly available at https://github.com/sinahmr/NACLIP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes