CVFeb 20, 2020

Learning Object Scale With Click Supervision for Object Detection

arXiv:2002.08555v13 citations
AI Analysis

This addresses the need for reducing annotation costs in object detection while improving performance, though it is incremental as it builds on existing weakly-supervised approaches.

The paper tackles the problem of weakly-supervised object detection by proposing a method that uses click supervision and CNN visualization to generate pseudo ground-truth bounding boxes, achieving higher accuracy in object scale estimation compared to state-of-the-art methods on PASCAL VOC datasets.

Weakly-supervised object detection has recently attracted increasing attention since it only requires image-levelannotations. However, the performance obtained by existingmethods is still far from being satisfactory compared with fully-supervised object detection methods. To achieve a good trade-off between annotation cost and object detection performance,we propose a simple yet effective method which incorporatesCNN visualization with click supervision to generate the pseudoground-truths (i.e., bounding boxes). These pseudo ground-truthscan be used to train a fully-supervised detector. To estimatethe object scale, we firstly adopt a proposal selection algorithmto preserve high-quality proposals, and then generate ClassActivation Maps (CAMs) for these preserved proposals by theproposed CNN visualization algorithm called Spatial AttentionCAM. Finally, we fuse these CAMs together to generate pseudoground-truths and train a fully-supervised object detector withthese ground-truths. Experimental results on the PASCAL VOC2007 and VOC 2012 datasets show that the proposed methodcan obtain much higher accuracy for estimating the object scale,compared with the state-of-the-art image-level based methodsand the center-click based method

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes