Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification
This work addresses the challenge of weakly supervised object localization and detection for computer vision applications, presenting an incremental improvement over existing pooling methods.
The paper tackles the problem of simultaneous pixel-level localization and image-level classification using only image-level labels by proposing global weighted average pooling (GWAP) modules, which better capture object regions on the ILSVRC dataset and improve object detection generalization on PASCAL VOC when combined with R-FCN.
In this work, we first tackle the problem of simultaneous pixel-level localization and image-level classification with only image-level labels for fully convolutional network training. We investigate the global pooling method which plays a vital role in this task. Classical global max pooling and average pooling methods are hard to indicate the precise regions of objects. Therefore, we revisit the global weighted average pooling (GWAP) method for this task and propose the class-agnostic GWAP module and the class-specific GWAP module in this paper. We evaluate the classification and pixel-level localization ability on the ILSVRC benchmark dataset. Experimental results show that the proposed GWAP module can better capture the regions of the foreground objects. We further explore the knowledge transfer between the image classification task and the region-based object detection task. We propose a multi-task framework that combines our class-specific GWAP module with R-FCN. The framework is trained with few ground truth bounding boxes and large-scale image-level labels. We evaluate this framework on PASCAL VOC dataset. Experimental results show that this framework can use the data with only image-level labels to improve the generalization of the object detection model.