Can Image-Level Labels Replace Pixel-Level Labels for Image Parsing
This addresses the challenge of reducing annotation costs for image parsing in social image collections, though it is incremental as it builds on weakly supervised methods.
The paper tackles the problem of image parsing using only noisy image-level labels instead of pixel-level labels, achieving effective segmentation and category identification on two benchmark datasets.
This paper presents a weakly supervised sparse learning approach to the problem of noisily tagged image parsing, or segmenting all the objects within a noisily tagged image and identifying their categories (i.e. tags). Different from the traditional image parsing that takes pixel-level labels as strong supervisory information, our noisily tagged image parsing is provided with noisy tags of all the images (i.e. image-level labels), which is a natural setting for social image collections (e.g. Flickr). By oversegmenting all the images into regions, we formulate noisily tagged image parsing as a weakly supervised sparse learning problem over all the regions, where the initial labels of each region are inferred from image-level labels. Furthermore, we develop an efficient algorithm to solve such weakly supervised sparse learning problem. The experimental results on two benchmark datasets show the effectiveness of our approach. More notably, the reported surprising results shed some light on answering the question: can image-level labels replace pixel-level labels (hard to access) as supervisory information for image parsing.