Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm
This addresses the problem of reducing annotation costs for object detection in computer vision, offering a practical solution for scenarios with limited labeled data.
The paper tackles object detection with image-level labels instead of costly instance-level labels, using an Expectation-Maximization method with CNNs, achieving significant performance improvement in weakly supervised settings and nearly matching fully supervised Fast RCNN with some annotated images.
Object detection when provided image-level labels instead of instance-level labels (i.e., bounding boxes) during training is an important problem in computer vision, since large scale image datasets with instance-level labels are extremely costly to obtain. In this paper, we address this challenging problem by developing an Expectation-Maximization (EM) based object detection method using deep convolutional neural networks (CNNs). Our method is applicable to both the weakly-supervised and semi-supervised settings. Extensive experiments on PASCAL VOC 2007 benchmark show that (1) in the weakly supervised setting, our method provides significant detection performance improvement over current state-of-the-art methods, (2) having access to a small number of strongly (instance-level) annotated images, our method can almost match the performace of the fully supervised Fast RCNN. We share our source code at https://github.com/ZiangYan/EM-WSD.