Asymptotically optimal private estimation under mean square loss
This work provides a theoretically optimal solution for private distribution estimation, which is important for privacy-preserving data analysis in fields like statistics and machine learning, though it builds incrementally on prior results.
The paper tackles the problem of estimating a discrete distribution under local differential privacy constraints, showing that their proposed privatization scheme and estimator achieve asymptotic optimality under mean square loss as the number of samples increases.
We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. In our previous work (arXiv:1702.00610), we proposed a family of new privatization schemes and the corresponding estimator. We also proved that our scheme and estimator are order optimal in the regime $e^ε \ll k$ under both $\ell_2^2$ and $\ell_1$ loss. In other words, for a large number of samples the worst-case estimation loss of our scheme was shown to differ from the optimal value by at most a constant factor. In this paper, we eliminate this gap by showing asymptotic optimality of the proposed scheme and estimator under the $\ell_2^2$ (mean square) loss. More precisely, we show that for any $k$ and $ε,$ the ratio between the worst-case estimation loss of our scheme and the optimal value approaches $1$ as the number of samples tends to infinity.