CVDec 22, 2015

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

arXiv:1512.06974v2233 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of using biased human-centric labels for computer vision tasks, which is incremental as it builds on existing noise modeling approaches.

The paper tackled the problem of learning accurate visual classifiers from noisy, subjective human annotations by modeling the structured noise in these labels. It demonstrated significant improvements, such as doubling performance over existing methods on datasets like MS COCO and Yahoo Flickr 100M.

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy "human-centric" annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting "what's in the image" versus "what's worth saying." We demonstrate the algorithm's efficacy along a variety of metrics and datasets, including MS COCO and Yahoo Flickr 100M. We show significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes