LG CR ITJun 15, 2023

Training generative models from privatized data

Daria Reshetova, Wei-Ning Chen, Ayfer Özgür

arXiv:2306.09547v23.86 citationsh-index: 29

Originality Highly original

AI Analysis

This addresses privacy-preserving data collection for machine learning practitioners, offering a novel method to mitigate privatization noise and dimensionality issues.

The paper tackles the problem of training generative models on locally differentially privatized data, showing that entropic regularization of optimal transport enables GANs to learn the raw data distribution from privatized samples and achieve fast statistical convergence at the parametric rate.

Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport - a popular regularization method in the literature that has often been leveraged for its computational benefits - enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples. We prove that at the same time this leads to fast statistical convergence at the parametric rate. This shows that entropic regularization of optimal transport uniquely enables the mitigation of both the effects of privatization noise and the curse of dimensionality in statistical convergence. We provide experimental evidence to support the efficacy of our framework in practice.

View on arXiv PDF

Similar