Benchmarking Differentially Private Synthetic Data Generation Algorithms
This work provides a comparative analysis for researchers and practitioners in privacy-preserving data generation, but it is incremental as it benchmarks existing methods without introducing new algorithms.
The authors systematically benchmarked differentially private synthetic data generation algorithms for tabular data, evaluating utility through distribution preservation, correlation, and classification accuracy, and identified top-performing and underperforming algorithms.
This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the distribution of individual and pairs of attributes, pairwise correlation as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluation we identify the top performing algorithms and those that consistently fail to beat baseline approaches.