CRAug 11, 2021

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

arXiv:2108.04978v1167 citations
Originality Highly original
AI Analysis

This provides a scalable and general solution for generating synthetic data with privacy guarantees, addressing a key challenge in data sharing and analysis.

The authors tackled the problem of generating differentially private synthetic data by proposing a general three-step approach, which won the 2018 NIST competition and achieved comparable performance in more general settings.

We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes