TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection
It addresses the issue of current methods being trapped by discriminative parts rather than entire objects, improving detection accuracy for computer vision applications, though it is incremental as it builds on existing weakly-supervised segmentation.
This paper tackles the problem of discovering tight object bounding boxes with only image-level supervision in weakly supervised object detection, achieving new state-of-the-art mAP scores of 48.0% on VOC 2007 and 44.4% on VOC 2012.
This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS2C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such low-quality distracting candidates and boost the high-quality ones. Specifically, TS2C is developed based on two key properties of desirable bounding boxes: 1) high purity, meaning most pixels in the box are with high object response, and 2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS2C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.