Private Estimation when Data and Privacy Demands are Correlated
This addresses privacy estimation for users with varying privacy needs, particularly when demands correlate with data, offering a more flexible approach than equal privacy for all, though it is incremental in extending DP to heterogeneous constraints.
The paper tackles the problem of empirical mean and frequency estimation under differential privacy with heterogeneous user privacy demands, where data and privacy requirements may be correlated. It establishes theoretical guarantees, achieving minimax optimality in some cases, and shows superior performance in experiments compared to baselines.
Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.