A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy
This addresses privacy protection in distributed systems for applications with heavy-tailed data or varied user contributions, though it is an incremental improvement over existing two-stage methods.
The paper tackles mean estimation under user-level differential privacy by proposing a Huber loss minimization approach to reduce bias from clipping and handle imbalanced user sample sizes, showing it is less sensitive to distribution tails and user imbalances.
Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clipping samples into the interval. However, the clipping operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local sample sizes can make the sensitivity much larger, thus the method is not suitable for imbalanced users. Motivated by these challenges, we propose a Huber loss minimization approach to mean estimation under user-level differential privacy. The connecting points of Huber loss can be adaptively adjusted to deal with imbalanced users. Moreover, it avoids the clipping operation, thus significantly reducing the bias compared with the two-stage approach. We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error. The result shows that the new method is much less sensitive to the imbalance of user-wise sample sizes and the tail of sample distributions. Finally, we perform numerical experiments to validate our theoretical analysis.