CVSep 22, 2023

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha

arXiv:2309.12943v114.142 citationsh-index: 24Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of localizing objects with only image-level labels, which is crucial for reducing annotation costs in computer vision, and it presents a novel approach that outperforms existing methods.

The paper tackles the problem of weakly supervised object localization and semantic segmentation by proposing a Background Activation Suppression (BAS) method that uses activation values instead of cross-entropy to learn object regions, achieving significant improvements and state-of-the-art performance on datasets like CUB-200-2011, ILSVRC, PASCAL VOC 2012, and MS COCO 2014.

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, 1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. 2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint (AMC) module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

View on arXiv PDF Code

Similar