CRDBAug 24, 2018

Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy

arXiv:1808.08181v129 citations
Originality Incremental advance
AI Analysis

This work addresses privacy concerns in crowdsourcing for tasks like data labeling, but it is incremental as it builds on existing LDP techniques with a focus on sparse data scenarios.

The paper tackles the problem of inferring true answers from sparse crowdsourcing data while protecting worker privacy under local differential privacy (LDP), and demonstrates through experiments that their new matrix factorization algorithm achieves higher accuracy than existing LDP methods on such data.

Crowdsourcing has arisen as a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. However, since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting answer privacy under local differential privacy (LDP), by which individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to enable to infer the true answer (i.e., truth) from the perturbed data with high accuracy. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of the existing approaches (e.g., Laplace perturbation and randomized response) may incur large error of truth inference on sparse data. Thus we design an efficient new matrix factorization (MF) algorithm under LDP. We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes