LGCRMay 27, 2021

Causally Constrained Data Synthesis for Private Data Release

arXiv:2105.13144v13 citations
Originality Incremental advance
AI Analysis

This work addresses the privacy-utility trade-off in data release for applications requiring evidence-based decisions, offering an incremental improvement over existing differentially private mechanisms.

The paper tackles the problem of generating synthetic data with strong privacy guarantees while maintaining utility, by incorporating causal information into the training process. The result shows that this approach improves resilience to membership inference and enhances downstream utility, with theoretical proofs of stronger differential privacy guarantees.

Making evidence based decisions requires data. However for real-world applications, the privacy of data is critical. Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data. To this end, prior works utilize differentially private data release mechanisms to provide formal privacy guarantees. However, such mechanisms have unacceptable privacy vs. utility trade-offs. We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off. We theoretically prove that generative models trained with additional causal knowledge provide stronger differential privacy guarantees. Empirically, we evaluate our solution comparing different models based on variational auto-encoders (VAEs), and show that causal information improves resilience to membership inference, with improvements in downstream utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes