Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching
This work aims to improve crowdsourcing for machine learning by enhancing label accuracy, but it appears incremental as it builds on classic aggregation models without specifying novel breakthroughs.
The paper addresses the limitations of existing crowdsourcing label aggregation models, which rely on assumptions and cannot perfectly infer ground truth, by proposing to consider workers' diverse labeling abilities and correlations. It introduces a framework tackling inference, learning, and teaching problems to better utilize rich annotation information from crowdsourced labels.
The unprecedented demand for large amount of data has catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. The classic work on crowdsourcing mainly focuses on the label inference problem under the categorization setting. However, inferring the true label requires sophisticated aggregation models that usually can only perform well under certain assumptions. Meanwhile, no matter how complicated the aggregation model is, the true model that generated the crowd labels remains unknown. Therefore, the label inference problem can never infer the ground truth perfectly. Based on the fact that the crowdsourcing labels are abundant and utilizing aggregation will lose such kind of rich annotation information (e.g., which worker provided which labels), we believe that it is critical to take the diverse labeling abilities of the crowdsourcing workers as well as their correlations into consideration. To address the above challenge, we propose to tackle three research problems, namely inference, learning, and teaching.