LGCVHCDec 24, 2020

Learning from Crowds by Modeling Common Confusions

arXiv:2012.13052v260 citations
AI Analysis

This work aims to improve the quality of machine learning models trained on noisy crowdsourced data, which is a significant problem for researchers and practitioners relying on large, cost-effective datasets.

This paper addresses the challenge of learning from crowdsourced annotations by decomposing annotation noise into common and individual components, differentiating confusion sources based on instance difficulty and annotator expertise. They propose an end-to-end learning solution with shared and individual noise adaptation layers, demonstrating its effectiveness on synthesized and real-world benchmarks.

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the crowdsourced annotations. In this work, we provide a new perspective to decompose annotation noise into common noise and individual noise and differentiate the source of confusion based on instance difficulty and annotator expertise on a per-instance-annotator basis. We realize this new crowdsourcing model by an end-to-end learning solution with two types of noise adaptation layers: one is shared across annotators to capture their commonly shared confusions, and the other one is pertaining to each annotator to realize individual confusion. To recognize the source of noise in each annotation, we use an auxiliary network to choose the two noise adaptation layers with respect to both instances and annotators. Extensive experiments on both synthesized and real-world benchmarks demonstrate the effectiveness of our proposed common noise adaptation solution.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes