Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation
This work addresses the high annotation cost problem for researchers and practitioners in computer vision by enabling efficient model training with less labeled data.
The paper tackles the problem of training deep convolutional neural networks for semantic image segmentation using weakly annotated data (like bounding boxes or image-level labels) or a mix of few strongly labeled and many weakly labeled images, achieving competitive results on the PASCAL VOC 2012 benchmark with significantly reduced annotation effort.
Deep convolutional neural networks (DCNNs) trained on a large number of images with strong pixel-level annotations have recently significantly pushed the state-of-art in semantic image segmentation. We study the more challenging problem of learning DCNNs for semantic image segmentation from either (1) weakly annotated training data such as bounding boxes or image-level labels or (2) a combination of few strongly labeled and many weakly labeled images, sourced from one or multiple datasets. We develop Expectation-Maximization (EM) methods for semantic image segmentation model training under these weakly supervised and semi-supervised settings. Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentation benchmark, while requiring significantly less annotation effort. We share source code implementing the proposed system at https://bitbucket.org/deeplab/deeplab-public.