CVOct 16, 2020

Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

Xiangwei Shi, Seyran Khademi, Yunqiang Li, Jan van Gemert

arXiv:2010.08644v19.126 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of weakly supervised object localization and segmentation for computer vision applications, offering an incremental improvement over existing visualization techniques.

The paper tackled the problem of generating fine-grained pixel annotations from image labels by proposing Zoom-CAM, which integrates importance maps from intermediate layers to capture small-scale objects missed by baseline methods, resulting in a 2.8% improvement in top-1 error on ImageNet localization and a 1.1% improvement in weakly supervised semantic segmentation.

Current weakly supervised object localization and segmentation rely on class-discriminative visualization techniques to generate pseudo-labels for pixel-level training. Such visualization methods, including class activation mapping (CAM) and Grad-CAM, use only the deepest, lowest resolution convolutional layer, missing all information in intermediate layers. We propose Zoom-CAM: going beyond the last lowest resolution layer by integrating the importance maps over all activations in intermediate layers. Zoom-CAM captures fine-grained small-scale objects for various discriminative class instances, which are commonly missed by the baseline visualization methods. We focus on generating pixel-level pseudo-labels from class labels. The quality of our pseudo-labels evaluated on the ImageNet localization task exhibits more than 2.8% improvement on top-1 error. For weakly supervised semantic segmentation our generated pseudo-labels improve a state of the art model by 1.1%.

View on arXiv PDF

Similar