CVCLMar 5, 2022

Cross Language Image Matching for Weakly Supervised Semantic Segmentation

arXiv:2203.02668v2167 citationsh-index: 36
AI Analysis

This work solves the issue of generating more accurate segmentation maps with only image-level labels, which is crucial for applications requiring detailed object localization without costly pixel-level annotations, representing a strong specific advance in WSSS.

The paper tackles the problem of weakly supervised semantic segmentation (WSSS) by addressing incomplete object activation and background inclusion in Class Activation Maps (CAM), using a Cross Language Image Matching (CLIMS) framework based on CLIP to improve activation maps, resulting in significant performance gains on the PASCAL VOC2012 dataset.

It has been widely known that CAM (Class Activation Map) usually only activates discriminative object regions and falsely includes lots of object-related backgrounds. As only a fixed set of image-level object labels are available to the WSSS (weakly supervised semantic segmentation) model, it could be very difficult to suppress those diverse background regions consisting of open set objects. In this paper, we propose a novel Cross Language Image Matching (CLIMS) framework, based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions. In particular, we design object, background region and text label matching losses to guide the model to excite more reasonable object regions for CAM of each category. In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions, with a predefined set of class-related background text descriptions. These designs enable the proposed CLIMS to generate a more complete and compact activation map for the target objects. Extensive experiments on PASCAL VOC2012 dataset show that our CLIMS significantly outperforms the previous state-of-the-art methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes