Rethinking the Route Towards Weakly Supervised Object Localization
This work addresses the problem of localizing objects with only image-level labels for computer vision applications, representing a novel approach rather than an incremental improvement.
The paper tackles weakly supervised object localization by dividing it into class-agnostic object localization and object classification, proposing the PSOL method that uses pseudo annotations and bounding box regression. It achieves 58.00% localization accuracy on ImageNet and 74.97% on CUB-200, significantly outperforming previous models.
Weakly supervised object localization (WSOL) aims to localize objects with only image-level labels. Previous methods often try to utilize feature maps and classification weights to localize objects using image level annotations indirectly. In this paper, we demonstrate that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification. For class-agnostic object localization, we should use class-agnostic methods to generate noisy pseudo annotations and then perform bounding box regression on them without class labels. We propose the pseudo supervised object localization (PSOL) method as a new way to solve WSOL. Our PSOL models have good transferability across different datasets without fine-tuning. With generated pseudo bounding boxes, we achieve 58.00% localization accuracy on ImageNet and 74.97% localization accuracy on CUB-200, which have a large edge over previous models.