CRPRSTJul 13, 2021

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

arXiv:2107.05824v224 citations
Originality Highly original
AI Analysis

This work addresses the conflict between privacy and utility in data release for research, business, and government, offering a novel approach to synthetic data generation with provable guarantees.

The paper tackled the NP-hard problem of generating synthetic data that is computationally efficient, provably private, and accurately quantifies utility by solving a relaxed version through theoretical probability, specifically covariance loss, to derive constructive, approximately optimal solutions for microaggregation, privacy, and synthetic data.

The protection of private information is of vital importance in data-driven research, business, and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees, and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy, and synthetic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes