Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation
This work improves scene understanding and automatic driving by enhancing segmentation accuracy with only image-level supervision, though it is incremental as it builds on existing CAM-based methods.
The paper tackles the problem of weakly supervised semantic segmentation by addressing the tendency of Class Activation Maps to focus only on discriminative regions, proposing an activation modulation and recalibration scheme that achieves state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing methods with stronger supervision like saliency labels.
Image-level weakly supervised semantic segmentation (WSSS) is a fundamental yet challenging computer vision task facilitating scene understanding and automatic driving. Most existing methods resort to classification-based Class Activation Maps (CAMs) to play as the initial pseudo labels, which tend to focus on the discriminative image regions and lack customized characteristics for the segmentation task. To alleviate this issue, we propose a novel activation modulation and recalibration (AMR) scheme, which leverages a spotlight branch and a compensation branch to obtain weighted CAMs that can provide recalibration supervision and task-specific concepts. Specifically, an attention modulation module (AMM) is employed to rearrange the distribution of feature importance from the channel-spatial sequential perspective, which helps to explicitly model channel-wise interdependencies and spatial encodings to adaptively modulate segmentation-oriented activation responses. Furthermore, we introduce a cross pseudo supervision for dual branches, which can be regarded as a semantic similar regularization to mutually refine two branches. Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.