CRDec 16, 2021

Benchmarking Differentially Private Synthetic Data Generation Algorithms

Yuchao Tao, Ryan McKenna, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

arXiv:2112.09238v236.2116 citations

Originality Synthesis-oriented

AI Analysis

This work provides a comparative analysis for researchers and practitioners in privacy-preserving data generation, but it is incremental as it benchmarks existing methods without introducing new algorithms.

The authors systematically benchmarked differentially private synthetic data generation algorithms for tabular data, evaluating utility through distribution preservation, correlation, and classification accuracy, and identified top-performing and underperforming algorithms.

This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the distribution of individual and pairs of attributes, pairwise correlation as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluation we identify the top performing algorithms and those that consistently fail to beat baseline approaches.

View on arXiv PDF

Similar