MLCRLGOct 18, 2022

Differentially Private Diffusion Models

arXiv:2210.09929v3147 citationsh-index: 29
Originality Incremental advance
AI Analysis

This addresses privacy concerns in domains with limited data by enabling synthetic data generation without compromising performance, though it is incremental as it builds on existing diffusion models and DP-SGD.

The paper tackles the challenge of training generative models on sensitive data by introducing Differentially Private Diffusion Models (DPDMs), which use DP-SGD and achieve state-of-the-art performance on image generation benchmarks, with classifiers trained on synthetic data matching task-specific DP-SGD-trained classifiers.

While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes