CRCLCVMar 26, 2025

Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead

arXiv:2503.20846v17 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of generating useful synthetic data with formal privacy guarantees for high-stakes domains, but is incremental as it synthesizes existing work and identifies gaps.

This survey examines privacy-preserving synthetic data generation, highlighting a significant performance degradation in leading methods under realistic privacy constraints (ε ≤ 4) on domain-specific datasets.

Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive framework for understanding the landscape of privacy-preserving synthetic data, presenting the theoretical foundations of generative models and differential privacy followed by a review of state-of-the-art methods across tabular data, images, and text. Our synthesis of evaluation approaches highlights the fundamental trade-off between utility for down-stream tasks and privacy guarantees, while identifying critical research gaps: the lack of realistic benchmarks representing specialized domains and insufficient empirical evaluations required to contextualise formal guarantees. Through empirical analysis of four leading methods on five real-world datasets from specialized domains, we demonstrate significant performance degradation under realistic privacy constraints ($ε\leq 4$), revealing a substantial gap between results reported on general domain benchmarks and performance on domain-specific data. %Our findings highlight key challenges including unaccounted privacy leakage, insufficient empirical verification of formal guarantees, and a critical deficit of realistic benchmarks. These challenges underscore the need for robust evaluation frameworks, standardized benchmarks for specialized domains, and improved techniques to address the unique requirements of privacy-sensitive fields such that this technology can deliver on its considerable potential.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes