CVJul 17, 2020

Improving Object Detection with Selective Self-supervised Self-training

arXiv:2007.09162v271 citations
AI Analysis

This work addresses the challenge of improving object detection for everyday scenes by leveraging diverse Web data, representing an incremental advance in self-supervised self-training methods.

The paper tackles the problem of augmenting human-curated object detection datasets with Web images by using image-to-image search to reduce domain shift and a selective net to rectify supervision signals, achieving state-of-the-art results on detecting backpacks, chairs, and other challenging classes.

We study how to leverage Web images to augment human-curated object detection datasets. Our approach is two-pronged. On the one hand, we retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods. The Web images are diverse, supplying a wide variety of object poses, appearances, their interactions with the context, etc. On the other hand, we propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification: self-training and self-supervised learning. They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets. To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images. It not only identifies positive bounding boxes but also creates a safe zone for mining hard negative boxes. We report state-of-the-art results on detecting backpacks and chairs from everyday scenes, along with other challenging object classes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes