Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)
This work addresses the challenge of efficiently handling ambiguous images in vision systems, particularly for applications like object recognition by blind people, though it is incremental as it builds on existing segmentation and crowdsourcing methods.
The paper tackles the problem of foreground object segmentation ambiguity by distinguishing between images with multiple valid segmentations and those with minor annotator differences, constructing the STATIC dataset to predict ambiguity. Their system reduces human effort by up to 47% in crowdsourcing segmentations without losing ground truth diversity.
We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight widely used datasets, we crowdsource labeling the images as "ambiguous" or "not ambiguous" to segment in order to construct a new dataset we call STATIC. Using STATIC, we develop a system that automatically predicts which images are ambiguous. Experiments demonstrate the advantage of our prediction system over existing saliency-based methods on images from vision benchmarks and images taken by blind people who are trying to recognize objects in their environment. Finally, we introduce a crowdsourcing system to achieve cost savings for collecting the diversity of all valid "ground truth" foreground object segmentations by collecting extra segmentations only when ambiguity is expected. Experiments show our system eliminates up to 47% of human effort compared to existing crowdsourcing methods with no loss in capturing the diversity of ground truths.