Candidate Labeling for Crowd Learning
This addresses the problem of inefficient label collection in crowdsourced machine learning, though it appears incremental as it builds on standard crowd learning scenarios.
The paper tackles the inefficiency of single-label annotations in crowdsourcing by proposing candidate labeling, where annotators provide multiple labels per instance, and presents empirical evidence supporting improved knowledge extraction.
Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked to provide, for each presented instance, a single class label. Such a request could be inefficient, that is, considering that the labelers may not be experts, that way to proceed could fail to take real advantage of the knowledge of the labelers. In this paper, the use of candidate labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label. The main hypothesis is that, by allowing candidate labeling, knowledge can be extracted from the labelers more efficiently by than in the standard crowd learning scenario. Empirical evidence which supports that hypothesis is presented.