CVAIMar 25, 2022

Self-supervised Semantic Segmentation Grounded in Visual Concepts

arXiv:2203.13868v29 citationsh-index: 22
AI Analysis

This addresses the challenging problem of pixel-level semantic segmentation without human annotations, which is incremental as it builds on existing self-supervised representation learning methods.

The paper tackles unsupervised semantic segmentation by proposing a self-supervised method that uses visual concepts to learn pixel representations, achieving consistent and substantial improvements over recent approaches on datasets like PASCAL VOC 2012, COCO 2017, and DAVIS 2017.

Unsupervised semantic segmentation requires assigning a label to every pixel without any human annotations. Despite recent advances in self-supervised representation learning for individual images, unsupervised semantic segmentation with pixel-level representations is still a challenging task and remains underexplored. In this work, we propose a self-supervised pixel representation learning method for semantic segmentation by using visual concepts (i.e., groups of pixels with semantic meanings, such as parts, objects, and scenes) extracted from images. To guide self-supervised learning, we leverage three types of relationships between pixels and concepts, including the relationships between pixels and local concepts, local and global concepts, as well as the co-occurrence of concepts. We evaluate the learned pixel embeddings and visual concepts on three datasets, including PASCAL VOC 2012, COCO 2017, and DAVIS 2017. Our results show that the proposed method gains consistent and substantial improvements over recent unsupervised semantic segmentation approaches, and also demonstrate that visual concepts can reveal insights into image datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes