Optimal Schemes for Discrete Distribution Estimation under Locally Differential Privacy
This work addresses privacy-preserving data analysis for statisticians and data scientists, offering incremental improvements in a specific regime.
The paper tackles the problem of estimating discrete distributions under local differential privacy, proposing a new family of schemes that improve performance in the medium privacy regime, reducing expected estimation loss by 50% under ℓ₂² metric and 30% under ℓ₁ metric compared to existing schemes.
We consider the minimax estimation problem of a discrete distribution with support size $k$ under privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. For a given $ε,$ we consider the problem of constructing optimal privatization schemes with $ε$-privacy level, i.e., schemes that minimize the expected estimation loss for the worst-case distribution. Two schemes in the literature provide order optimal performance in the high privacy regime where $ε$ is very close to $0,$ and in the low privacy regime where $e^ε\approx k,$ respectively. In this paper, we propose a new family of schemes which substantially improve the performance of the existing schemes in the medium privacy regime when $1\ll e^ε \ll k.$ More concretely, we prove that when $3.8 < ε<\ln(k/9) ,$ our schemes reduce the expected estimation loss by $50\%$ under $\ell_2^2$ metric and by $30\%$ under $\ell_1$ metric over the existing schemes. We also prove a lower bound for the region $e^ε \ll k,$ which implies that our schemes are order optimal in this regime.