HC LG MLApr 25, 2019

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

Yao Ma, Alex Olshevsky, Venkatesh Saligrama, Csaba Szepesvari

arXiv:1904.11608v221.428 citations

Originality Incremental advance

AI Analysis

This addresses skill estimation in crowdsourcing for applications like data labeling, though it appears incremental as it builds on existing single-coin Dawid-Skene models with a new optimization approach.

The paper tackles the problem of estimating worker skills in crowdsourcing when worker assignments are sparse and irregular, by formulating it as a rank-one matrix completion problem and showing that skills are identifiable if the sampling matrix lacks bipartite connected components. They propose a projected gradient descent scheme that converges to global optima and achieves state-of-the-art performance on real-world datasets.

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlations between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.

View on arXiv PDF

Similar