WebSeg: Learning Semantic Segmentation from Web Searches
This work addresses the challenge of reducing annotation dependence in semantic segmentation for computer vision applications, representing an incremental improvement over previous weakly supervised methods.
The paper tackles the problem of semantic segmentation by learning from web-crawled images without explicit annotations, achieving mIoU scores of 57.0% in a web-based setting and 63.3% in a weakly supervised setting on the PASCAL VOC 2012 benchmark.
In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods. To solve such a challenging problem, we leverage several low-level cues (such as saliency, edges, etc.) to help generate a proxy ground truth. Due to the diversity of web-crawled images, we anticipate a large amount of 'label noise' in which other objects might be present. We design an online noise filtering scheme which is able to deal with this label noise, especially in cluttered images. We use this filtering strategy as an auxiliary module to help assist the segmentation network in learning cleaner proxy annotations. Extensive experiments on the popular PASCAL VOC 2012 semantic segmentation benchmark show surprising good results in both our WebSeg (mIoU = 57.0%) and weakly supervised (mIoU = 63.3%) settings.