CV LGJan 21, 2020

Evaluating Weakly Supervised Object Localization Methods Right

Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim

arXiv:2001.07437v229.2204 citationsh-index: 48Has Code

Originality Synthesis-oriented

AI Analysis

This work critiques evaluation practices in WSOL, highlighting that current methods may not effectively solve the problem for computer vision researchers seeking label-efficient localization.

The paper argues that weakly supervised object localization (WSOL) is ill-posed with only image-level labels and proposes a new evaluation protocol using limited full supervision on a held-out set, finding that five recent WSOL methods show no major improvement over the CAM baseline and do not reach few-shot learning baselines.

Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision to validate hyperparameters and for model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL.

View on arXiv PDF Code

Similar