LGNov 7, 2024
Differential Privacy in Continual Learning: Which Labels to Update?Marlon Tobaben, Talal Alrawajfeh, Marcus Klasson et al.
The goal of continual learning (CL) is to retain knowledge across tasks, but this conflicts with strict privacy required for sensitive training data that prevents storing or memorising individual samples. To address that, we combine CL and differential privacy (DP). We highlight that failing to account for privacy leakage through the set of labels a model can output can break the privacy of otherwise valid DP algorithms. This is especially relevant in CL. We show that mitigating the issue with a data-independent overly large label space can have minimal negative impact on utility when fine-tuning a pre-trained model under DP, while learning the labels with a separate DP mechanism risks losing small classes.
QMJan 29, 2019
Representation Transfer for Differentially Private Drug Sensitivity PredictionTeppo Niinimäki, Mikko Heikkilä, Antti Honkela et al.
Motivation: Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymisation strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee the privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. Results: We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, PCA and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction.
MLMar 3, 2017
Differentially Private Bayesian Learning on Distributed DataMikko Heikkilä, Eemil Lagerspetz, Samuel Kaski et al.
Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results. The standard DP algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a distributed setting, where each party only holds a single sample or a few samples of the data. We propose a learning strategy based on a secure multi-party sum function for aggregating summaries from data holders and the Gaussian mechanism for DP. Our method builds on an asymptotically optimal and practically efficient DP Bayesian inference with rapidly diminishing extra cost.