Optimal Inference in Crowdsourced Classification via Belief Propagation
This provides an optimal solution for crowdsourcing systems, improving label accuracy for tasks like data annotation, though it is incremental as it builds on existing models.
The paper tackles the problem of recovering true labels from noisy crowdsourced data under the Dawid-Skene model, proving that Belief Propagation (BP) achieves optimal inference by matching a new tighter lower bound on the fundamental limit, with experiments showing it outperforms state-of-the-art algorithms.
Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap by introducing a tighter lower bound on the fundamental limit and proving that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. Experimental results suggest that BP is close to optimal for all regimes considered and improves upon competing state-of-the-art algorithms.