An Active Learning Approach for Jointly Estimating Worker Performance and Annotation Reliability with Crowdsourced Data
This work addresses the challenge of cost-effective and reliable dataset annotation for supervised learning, though it is incremental as it builds on existing active learning and crowdsourcing methods.
The paper tackles the problem of unreliable annotations from crowdsourced workers by jointly estimating worker performance, task difficulty, and annotation reliability to guide an active learning selection procedure, resulting in significantly improved training accuracy and correct ranking of worker expertise in noisy conditions.
Crowdsourcing platforms offer a practical solution to the problem of affordably annotating large datasets for training supervised classifiers. Unfortunately, poor worker performance frequently threatens to compromise annotation reliability, and requesting multiple labels for every instance can lead to large cost increases without guaranteeing good results. Minimizing the required training samples using an active learning selection procedure reduces the labeling requirement but can jeopardize classifier training by focusing on erroneous annotations. This paper presents an active learning approach in which worker performance, task difficulty, and annotation reliability are jointly estimated and used to compute the risk function guiding the sample selection procedure. We demonstrate that the proposed approach, which employs active learning with Bayesian networks, significantly improves training accuracy and correctly ranks the expertise of unknown labelers in the presence of annotation noise.