CRLGOct 10, 2023

Partition-based differentially private synthetic data generation

arXiv:2310.06371v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work provides a solution for organizations needing to share sensitive data privately, though it appears incremental as it builds on the select-measure-generate paradigm.

The paper tackled the problem of generating high-quality differentially private synthetic data by addressing errors in measuring large domain marginals and difficulties in privacy budget allocation, resulting in improved data quality and utility compared to existing methods.

Private synthetic data sharing is preferred as it keeps the distribution and nuances of original data compared to summary statistics. The state-of-the-art methods adopt a select-measure-generate paradigm, but measuring large domain marginals still results in much error and allocating privacy budget iteratively is still difficult. To address these issues, our method employs a partition-based approach that effectively reduces errors and improves the quality of synthetic data, even with a limited privacy budget. Results from our experiments demonstrate the superiority of our method over existing approaches. The synthetic data produced using our approach exhibits improved quality and utility, making it a preferable choice for private synthetic data sharing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes