Combining Public and Private Data
This work addresses privacy-preserving data analysis for scenarios involving mixed public and private data, representing an incremental improvement over existing techniques.
The paper tackles the problem of estimating aggregate statistics from data with heterogeneous privacy needs by introducing mixed estimators for mean and median optimized to minimize variance. The experiments show that these mechanisms often outperform baseline methods from prior work.
Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.