CVOct 16, 2020

Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels

arXiv:2010.08644v126 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of weakly supervised object localization and segmentation for computer vision applications, offering an incremental improvement over existing visualization techniques.

The paper tackled the problem of generating fine-grained pixel annotations from image labels by proposing Zoom-CAM, which integrates importance maps from intermediate layers to capture small-scale objects missed by baseline methods, resulting in a 2.8% improvement in top-1 error on ImageNet localization and a 1.1% improvement in weakly supervised semantic segmentation.

Current weakly supervised object localization and segmentation rely on class-discriminative visualization techniques to generate pseudo-labels for pixel-level training. Such visualization methods, including class activation mapping (CAM) and Grad-CAM, use only the deepest, lowest resolution convolutional layer, missing all information in intermediate layers. We propose Zoom-CAM: going beyond the last lowest resolution layer by integrating the importance maps over all activations in intermediate layers. Zoom-CAM captures fine-grained small-scale objects for various discriminative class instances, which are commonly missed by the baseline visualization methods. We focus on generating pixel-level pseudo-labels from class labels. The quality of our pseudo-labels evaluated on the ImageNet localization task exhibits more than 2.8% improvement on top-1 error. For weakly supervised semantic segmentation our generated pseudo-labels improve a state of the art model by 1.1%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes