CRLGDec 25, 2018

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants

arXiv:1812.10113v325 citations
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users in collaborative settings, such as those with sensitive data like medical records, though it is incremental as it builds on existing privacy-preserving techniques.

The paper tackles the problem of privacy leakage in collaborative deep learning by proposing a system that allows participants to cooperatively build a model without sharing data or storing it centrally, using differential privacy and handling unreliable participants, achieving high accuracy close to centralized training while ensuring privacy.

With powerful parallel computing GPUs and massive user data, neural-network-based deep learning can well exert its strong power in problem modeling and solving, and has archived great success in many applications such as image classification, speech recognition and machine translation etc. While deep learning has been increasingly popular, the problem of privacy leakage becomes more and more urgent. Given the fact that the training data may contain highly sensitive information, e.g., personal medical records, directly sharing them among the users (i.e., participants) or centrally storing them in one single location may pose a considerable threat to user privacy. In this paper, we present a practical privacy-preserving collaborative deep learning system that allows users to cooperatively build a collective deep learning model with data of all participants, without direct data sharing and central data storage. In our system, each participant trains a local model with their own data and only shares model parameters with the others. To further avoid potential privacy leakage from sharing model parameters, we use functional mechanism to perturb the objective function of the neural network in the training process to achieve $ε$-differential privacy. In particular, for the first time, we consider the existence of~\textit{unreliable participants}, i.e., the participants with low-quality data, and propose a solution to reduce the impact of these participants while protecting their privacy. We evaluate the performance of our system on two well-known real-world datasets for regression and classification tasks. The results demonstrate that the proposed system is robust against unreliable participants, and achieves high accuracy close to the model trained in a traditional centralized manner while ensuring rigorous privacy protection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes