DS CR IT LG MLFeb 21, 2020

Private Mean Estimation of Heavy-Tailed Distributions

Gautam Kamath, Vikrant Singhal, Jonathan Ullman

arXiv:2002.09464v326.2117 citations

Originality Highly original

AI Analysis

This work addresses the challenge of private statistical estimation for heavy-tailed data, providing fundamental bounds that reveal a qualitative shift compared to non-private methods, which is incremental in extending prior privacy results to broader distribution classes.

The paper tackles the problem of differentially private mean estimation for heavy-tailed distributions with bounded k-th moments, showing that in the univariate case, the sample complexity is Θ(1/α² + 1/(α^(k/(k-1))ε)), which differs from non-private estimation where it is the same for all k ≥ 2, and extends this to multivariate settings with an O(d) factor increase.

We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments. Roughly speaking, in the univariate case, we show that $n = Θ\left(\frac{1}{α^2} + \frac{1}{α^{\frac{k}{k-1}}\varepsilon}\right)$ samples are necessary and sufficient to estimate the mean to $α$-accuracy under $\varepsilon$-differential privacy, or any of its common relaxations. This result demonstrates a qualitatively different behavior compared to estimation absent privacy constraints, for which the sample complexity is identical for all $k \geq 2$. We also give algorithms for the multivariate setting whose sample complexity is a factor of $O(d)$ larger than the univariate case.

View on arXiv PDF

Similar