Haipei Sun

CRJan 24, 2020

Privacy for All: Demystify Vulnerability Disparity of Differential Privacy against Membership Inference Attack

Bo Zhang, Ruotong Yu, Haipei Sun et al.

Machine learning algorithms, when applied to sensitive data, pose a potential threat to privacy. A growing body of prior work has demonstrated that membership inference attack (MIA) can disclose specific private information in the training data to an attacker. Meanwhile, the algorithmic fairness of machine learning has increasingly caught attention from both academia and industry. Algorithmic fairness ensures that the machine learning models do not discriminate a particular demographic group of individuals (e.g., black and female people). Given that MIA is indeed a learning model, it raises a serious concern if MIA ``fairly'' treats all groups of individuals equally. In other words, whether a particular group is more vulnerable against MIA than the other groups. This paper examines the algorithmic fairness issue in the context of MIA and its defenses. First, for fairness evaluation, it formalizes the notation of vulnerability disparity (VD) to quantify the difference of MIA treatment on different demographic groups. Second, it evaluates VD on four real-world datasets, and shows that VD indeed exists in these datasets. Third, it examines the impacts of differential privacy, as a defense mechanism of MIA, on VD. The results show that although DP brings significant change on VD, it cannot eliminate VD completely. Therefore, fourth, it designs a new mitigation algorithm named FAIRPICK to reduce VD. An extensive set of experimental results demonstrate that FAIRPICK can effectively reduce VD for both with and without the DP deployment.

CRAug 24, 2018

Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy

Haipei Sun, Boxiang Dong, Hui et al.

Crowdsourcing has arisen as a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. However, since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting answer privacy under local differential privacy (LDP), by which individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to enable to infer the true answer (i.e., truth) from the perturbed data with high accuracy. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of the existing approaches (e.g., Laplace perturbation and randomized response) may incur large error of truth inference on sparse data. Thus we design an efficient new matrix factorization (MF) algorithm under LDP. We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.

Haipei Sun

2 Papers