Distribution-Aware Mean Estimation under User-level Local Differential Privacy
This addresses a more realistic scenario in privacy-preserving statistics for applications like federated learning, though it is incremental by extending known bounds to variable sample sizes.
The paper tackles mean estimation under user-level local differential privacy when users have varying numbers of data samples, establishing distribution-dependent upper and lower bounds that asymptotically match up to logarithmic factors and generalize prior work.
We consider the problem of mean estimation under user-level local differential privacy, where $n$ users are contributing through their local pool of data samples. Previous work assume that the number of data samples is the same across users. In contrast, we consider a more general and realistic scenario where each user $u \in [n]$ owns $m_u$ data samples drawn from some generative distribution $μ$; $m_u$ being unknown to the statistician but drawn from a known distribution $M$ over $\mathbb{N}^\star$. Based on a distribution-aware mean estimation algorithm, we establish an $M$-dependent upper bounds on the worst-case risk over $μ$ for the task of mean estimation. We then derive a lower bound. The two bounds are asymptotically matching up to logarithmic factors and reduce to known bounds when $m_u = m$ for any user $u$.