MLAICRLGMar 2, 2020

Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees

arXiv:2003.00997v15 citations
AI Analysis

This addresses privacy concerns for users in data annotation and inspection tasks, but it is incremental as it builds on prior work with improved fidelity and trade-offs.

The paper tackles the problem of enhancing user privacy in machine learning tasks by substituting real data with synthetic samples from a generative adversarial network, achieving higher-fidelity samples that detect more subtle data errors and biases and reduce the need for real data labeling with high accuracy.

This paper considers the problem of enhancing user privacy in common machine learning development tasks, such as data annotation and inspection, by substituting the real data with samples form a generative adversarial network. We propose employing Bayesian differential privacy as the means to achieve a rigorous theoretical guarantee while providing a better privacy-utility trade-off. We demonstrate experimentally that our approach produces higher-fidelity samples, compared to prior work, allowing to (1) detect more subtle data errors and biases, and (2) reduce the need for real data labelling by achieving high accuracy when training directly on artificial samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes