Differential privacy and robust statistics in high dimensions
This work addresses the challenge of maintaining privacy while ensuring statistical efficiency in high-dimensional data analysis, representing a novel method for a known bottleneck in differential privacy.
The authors tackled the problem of achieving differential privacy in high-dimensional statistical estimation by introducing the HPTR framework, which combines exponential mechanisms, robust statistics, and Propose-Test-Release to reduce local sensitivity and provide near-optimal utility guarantees, as demonstrated in mean estimation, linear regression, covariance estimation, and PCA.
We introduce a universal framework for characterizing the statistical efficiency of a statistical estimation problem with differential privacy guarantees. Our framework, which we call High-dimensional Propose-Test-Release (HPTR), builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. Gluing all these together is the concept of resilience, which is central to robust statistical estimation. Resilience guides the design of the algorithm, the sensitivity analysis, and the success probability analysis of the test step in Propose-Test-Release. The key insight is that if we design an exponential mechanism that accesses the data only via one-dimensional robust statistics, then the resulting local sensitivity can be dramatically reduced. Using resilience, we can provide tight local sensitivity bounds. These tight bounds readily translate into near-optimal utility guarantees in several cases. We give a general recipe for applying HPTR to a given instance of a statistical estimation problem and demonstrate it on canonical problems of mean estimation, linear regression, covariance estimation, and principal component analysis. We introduce a general utility analysis technique that proves that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.