Better and Simpler Lower Bounds for Differentially Private Statistical Estimation
This work addresses fundamental limitations in privacy-preserving data analysis for statisticians and machine learning practitioners, providing rigorous lower bounds that are incremental improvements over prior results.
The paper tackles the problem of establishing optimal sample complexity lower bounds for differentially private statistical estimation, proving tight bounds for covariance estimation of Gaussians and mean estimation of heavy-tailed distributions, with results matching known upper bounds up to logarithmic factors.
We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $α\le O(1)$, estimating the covariance of a Gaussian up to spectral error $α$ requires $\tildeΩ\left(\frac{d^{3/2}}{α\varepsilon} + \frac{d}{α^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $α\le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tildeΩ\left(\frac{d}{α^{k/(k-1)} \varepsilon} + \frac{d}{α^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.