Deep learning from crowds
This addresses the challenge of efficiently labeling large datasets for deep learning by leveraging crowdsourcing, which is crucial for scaling supervised models in domains with limited expert annotations.
The paper tackles the problem of training deep neural networks from noisy crowdsourced labels by proposing a general-purpose crowd layer that jointly learns network parameters and annotator reliabilities end-to-end using backpropagation, achieving new state-of-the-art results on various datasets for classification, regression, and sequence labeling.
Over the last few years, deep learning has revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. However, as the size of supervised artificial neural networks grows, typically so does the need for larger labeled datasets. Recently, crowdsourcing has established itself as an efficient and cost-effective solution for labeling large sets of data in a scalable manner, but it often requires aggregating labels from multiple noisy contributors with different levels of expertise. In this paper, we address the problem of learning deep neural networks from crowds. We begin by describing an EM algorithm for jointly learning the parameters of the network and the reliabilities of the annotators. Then, a novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using only backpropagation. We empirically show that the proposed approach is able to internally capture the reliability and biases of different annotators and achieve new state-of-the-art results for various crowdsourced datasets across different settings, namely classification, regression and sequence labeling.