LGCRMLOct 29, 2021

Combining Public and Private Data

arXiv:2111.00115v110 citations
Originality Incremental advance
AI Analysis

This work addresses privacy-preserving data analysis for scenarios involving mixed public and private data, representing an incremental improvement over existing techniques.

The paper tackles the problem of estimating aggregate statistics from data with heterogeneous privacy needs by introducing mixed estimators for mean and median optimized to minimize variance. The experiments show that these mechanisms often outperform baseline methods from prior work.

Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes