Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels
This work provides theoretical validation for a widely used method in crowdsourcing, which is crucial for researchers and practitioners in machine learning dealing with noisy label data.
The paper tackles the problem of estimating true labels from noisy crowdsourced data by proving convergence rates for the Dawid-Skene estimator using a projected EM algorithm, showing optimality via a lower bound and resolving theoretical guarantees for its practical performance.
Crowdsourcing has become a primary means for label collection in many real-world machine learning applications. A classical method for inferring the true labels from the noisy labels provided by crowdsourcing workers is Dawid-Skene estimator. In this paper, we prove convergence rates of a projected EM algorithm for the Dawid-Skene estimator. The revealed exponent in the rate of convergence is shown to be optimal via a lower bound argument. Our work resolves the long standing issue of whether Dawid-Skene estimator has sound theoretical guarantees besides its good performance observed in practice. In addition, a comparative study with majority voting illustrates both advantages and pitfalls of the Dawid-Skene estimator.