CVJun 9, 2020

Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps

Xiaolin Zhang, Yunchao Wei, Yi Yang, Fei Wu

arXiv:2006.05220v211.131 citationsHas Code

Originality Highly original

AI Analysis

This work addresses evaluation limitations in weakly supervised object localization for computer vision researchers, offering improved metrics and a novel method for better object perception.

The paper tackles the problem of evaluating weakly supervised object localization maps by proposing a direct pixel-wise evaluation method using annotated object masks and IoU-Threshold curves, and introduces a self-enhancement method that achieves state-of-the-art localization accuracy of 54.88% on ILSVRC.

Recently, remarkable progress has been made in weakly supervised object localization (WSOL) to promote object localization maps. The common practice of evaluating these maps applies an indirect and coarse way, i.e., obtaining tight bounding boxes which can cover high-activation regions and calculating intersection-over-union (IoU) scores between the predicted and ground-truth boxes. This measurement can evaluate the ability of localization maps to some extent, but we argue that the maps should be measured directly and delicately, i.e., comparing the maps with the ground-truth object masks pixel-wisely. To fulfill the direct evaluation, we annotate pixel-level object masks on the ILSVRC validation set. We propose to use IoU-Threshold curves for evaluating the real quality of localization maps. Beyond the amended evaluation metric and annotated object masks, this work also introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. We propose a two-stage approach to generate the localization maps by simply comparing the similarity of point-wise features between the high-activation and the rest pixels. Based on the predicted localization maps, we explore to estimate object boundaries on a very large dataset. A hard-negative suppression loss is proposed for obtaining fine boundaries. We conduct extensive experiments on the ILSVRC and CUB benchmarks. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC. The code and the annotated masks are released at https://github.com/xiaomengyc/SEM.

View on arXiv PDF Code

Similar