CVNov 23, 2023

Language-guided Few-shot Semantic Segmentation

arXiv:2311.13865v15 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the problem of reducing label costs for semantic segmentation in new categories, offering an incremental improvement by leveraging language guidance.

The paper tackles the challenge of few-shot semantic segmentation by using only language information (image-level text labels) instead of expensive pixel-level annotations, achieving competitive results compared to recent vision-guided methods on benchmark datasets.

Few-shot learning is a promising way for reducing the label cost in new categories adaptation with the guidance of a small, well labeled support set. But for few-shot semantic segmentation, the pixel-level annotations of support images are still expensive. In this paper, we propose an innovative solution to tackle the challenge of few-shot semantic segmentation using only language information, i.e.image-level text labels. Our approach involves a vision-language-driven mask distillation scheme, which contains a vision-language pretraining (VLP) model and a mask refiner, to generate high quality pseudo-semantic masks from text prompts. We additionally introduce a distributed prototype supervision method and complementary correlation matching module to guide the model in digging precise semantic relations among support and query images. The experiments on two benchmark datasets demonstrate that our method establishes a new baseline for language-guided few-shot semantic segmentation and achieves competitive results to recent vision-guided methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes