CVMar 5, 2015

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

arXiv:1503.01640v21100 citations
AI Analysis

This reduces labeling effort for semantic segmentation tasks, making it more accessible, though it is incremental as it builds on existing region proposal and network training methods.

The paper tackles the problem of expensive pixel-level annotation for semantic segmentation by proposing BoxSup, a method that uses only bounding box annotations to train convolutional networks, achieving competitive accuracy on par with mask-supervised baselines and state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT.

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called BoxSup, produces competitive results supervised by boxes only, on par with strong baselines fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes