Private Evolution Converges
This work addresses the problem of ensuring reliable convergence for differentially private synthetic data generation methods, particularly for researchers and practitioners in privacy-preserving machine learning, though it is incremental as it builds on prior theoretical analyses.
The paper tackled the inconsistent behavior of Private Evolution (PE) for differentially private synthetic data generation by developing a new theoretical framework to prove its convergence under realistic conditions, showing that PE produces synthetic data with expected 1-Wasserstein distance of ˜O(d(nε)^{-1/d}) from the original dataset.
Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to understand PE's practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a convex and compact domain, we prove that under the right hyperparameter settings and given access to the Gaussian variation API proposed in \cite{PE23}, PE produces an $(\varepsilon, δ)$-DP synthetic dataset with expected 1-Wasserstein distance $\tilde{O}(d(n\varepsilon)^{-1/d})$ from the original; this establishes worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in experiments.