LGCVDec 13, 2024

Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training

arXiv:2412.09842v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses privacy-preserving generative modeling for applications like medical imaging, but it is incremental as it builds on existing synthetic data methods for differential privacy.

The paper tackles the challenge of using programmatically generated synthetic data for differentially private diffusion training, which often produces unrealistic images due to distribution mismatches, and proposes DP-SynGen to leverage synthetic data in specific diffusion stages, reducing the privacy budget and improving generative data quality.

Programmatically generated synthetic data has been used in differential private training for classification to enhance performance without privacy leakage. However, as the synthetic data is generated from a random process, the distribution of real data and the synthetic data are distinguishable and difficult to transfer. Therefore, the model trained with the synthetic data generates unrealistic random images, raising challenges to adapt the synthetic data for generative models. In this work, we propose DP-SynGen, which leverages programmatically generated synthetic data in diffusion models to address this challenge. By exploiting the three stages of diffusion models(coarse, context, and cleaning) we identify stages where synthetic data can be effectively utilized. We theoretically and empirically verified that cleaning and coarse stages can be trained without private data, replacing them with synthetic data to reduce the privacy budget. The experimental results show that DP-SynGen improves the quality of generative data by mitigating the negative impact of privacy-induced noise on the generation process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes