CRDec 16, 2021

Benchmarking Differentially Private Synthetic Data Generation Algorithms

arXiv:2112.09238v2113 citations
Originality Synthesis-oriented
AI Analysis

This work provides a comparative analysis for researchers and practitioners in privacy-preserving data generation, but it is incremental as it benchmarks existing methods without introducing new algorithms.

The authors systematically benchmarked differentially private synthetic data generation algorithms for tabular data, evaluating utility through distribution preservation, correlation, and classification accuracy, and identified top-performing and underperforming algorithms.

This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the distribution of individual and pairs of attributes, pairwise correlation as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluation we identify the top performing algorithms and those that consistently fail to beat baseline approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes