Convolutional STN for Weakly Supervised Object Localization
This addresses the problem of localizing objects with weak supervision for computer vision applications, but it is incremental as it builds on existing CNN-based methods.
The paper tackles weakly supervised object localization by proposing a convolutional, multi-scale spatial localization network to improve accuracy, achieving competitive performance on CUB-200-2011 and ImageNet datasets.
Weakly supervised object localization is a challenging task in which the object of interest should be localized while learning its appearance. State-of-the-art methods recycle the architecture of a standard CNN by using the activation maps of the last layer for localizing the object. While this approach is simple and works relatively well, object localization relies on different features than classification, thus, a specialized localization mechanism is required during training to improve performance. In this paper, we propose a convolutional, multi-scale spatial localization network that provides accurate localization for the object of interest. Experimental results on CUB-200-2011 and ImageNet datasets show that our proposed approach provides competitive performance for weakly supervised localization.