Exact Exponent in Optimal Rates for Crowdsourcing
This work addresses the problem of efficiently aggregating labels from non-expert workers in machine learning applications, offering a theoretical foundation for optimal error rates, but it is incremental as it builds on existing models and algorithms.
The paper establishes matching upper and lower bounds for the optimal error rate in crowdsourcing under the Dawid-Skene model, deriving an exact exponent mI(π) that characterizes the collective ability of workers and providing a precise sample size requirement m > (1/I(π)) log(1/ε) to achieve an ε misclassification error.
In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent $mI(π)$ in which $m$ is the number of workers and $I(π)$ the average Chernoff information that characterizes the workers' collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement $m>\frac{1}{I(π)}\log\frac{1}ε$ in order to achieve an $ε$ misclassification error. In addition, our results imply the optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.