MLJun 15, 2014

Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

arXiv:1406.3824v3405 citations
Originality Incremental advance
AI Analysis

This provides a provably optimal solution for crowdsourcing label inference, addressing a key bottleneck in data collection, though it is incremental as it builds on existing Dawid-Skene estimator methods.

The paper tackles the problem of inferring true labels from noisy crowdsourced data by proposing a two-stage algorithm combining spectral methods and EM, achieving optimal convergence rate up to a logarithmic factor and performing comparably to the most accurate empirical approach in experiments.

Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes