CVDec 6, 2021

GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

arXiv:2112.02841v216 citations
Originality Highly original
AI Analysis

This addresses the challenge of pixel-level segmentation with only image-level labels, which is crucial for reducing annotation costs in computer vision, though it is an incremental improvement over existing transformer methods.

The paper tackles the partial activation problem in weakly-supervised semantic segmentation by proposing GETAM, a transformer-based method that generates fine-scale activation maps, achieving state-of-the-art results on PASCAL VOC and COCO datasets.

Weakly Supervised Semantic Segmentation (WSSS) is challenging, particularly when image-level labels are used to supervise pixel level prediction. To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels. CAMs in Convolutional Neural Networks suffer from partial activation ie, only the most discriminative regions are activated. Transformer based methods, on the other hand, are highly effective at exploring global context with long range dependency modeling, potentially alleviating the "partial activation" issue. In this paper, we propose the first transformer based WSSS approach, and introduce the Gradient weighted Element wise Transformer Attention Map (GETAM). GETAM shows fine scale activation for all feature map elements, revealing different parts of the object across transformer layers. Further, we propose an activation aware label completion module to generate high quality pseudo labels. Finally, we incorporate our methods into an end to end framework for WSSS using double backward propagation. Extensive experiments on PASCAL VOC and COCO demonstrate that our results beat the state-of-the-art end-to-end approaches by a significant margin, and outperform most multi-stage methods.m most multi-stage methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes