DSCRITSTAug 11, 2021

Statistical Inference in the Differential Privacy Model

arXiv:2108.05000v11 citations
Originality Incremental advance
AI Analysis

This work addresses privacy concerns in data analysis for sensitive datasets, providing foundational insights into the cost of differential privacy, though it appears incremental within the mature field.

The paper tackles the problem of quantifying the additional data required for statistical inference under differential privacy constraints, addressing fundamental questions like sample complexity for hypothesis testing and distribution estimation, and presents results on private estimation of random graphs and trade-offs in hypothesis selection.

In modern settings of data analysis, we may be running our algorithms on datasets that are sensitive in nature. However, classical machine learning and statistical algorithms were not designed with these risks in mind, and it has been demonstrated that they may reveal personal information. These concerns disincentivize individuals from providing their data, or even worse, encouraging intentionally providing fake data. To assuage these concerns, we import the constraint of differential privacy to the statistical inference, considered by many to be the gold standard of data privacy. This thesis aims to quantify the cost of ensuring differential privacy, i.e., understanding how much additional data is required to perform data analysis with the constraint of differential privacy. Despite the maturity of the literature on differential privacy, there is still inadequate understanding in some of the most fundamental settings. In particular, we make progress in the following problems: $\bullet$ What is the sample complexity of DP hypothesis testing? $\bullet$ Can we privately estimate distribution properties with a negligible cost? $\bullet$ What is the fundamental limit in private distribution estimation? $\bullet$ How can we design algorithms to privately estimate random graphs? $\bullet$ What is the trade-off between the sample complexity and the interactivity in private hypothesis selection?

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes