Improve Learning from Crowds via Generative Augmentation
This addresses the problem of limited model quality due to sparse annotations in low-budget crowdsourcing, offering a practical solution for cost-effective label collection.
The paper tackles the sparsity issue in crowdsourced data by proposing a generative augmentation framework using GANs to generate high-quality annotations, achieving improved performance over state-of-the-art methods on three real-world datasets.
Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.