User-Level Private Learning via Correlated Sampling
This work addresses privacy protection for users with multiple data points in machine learning, offering significant reductions in user requirements compared to single-sample settings.
The paper tackles the problem of user-level differential privacy where each user has multiple samples, showing that only O(log(1/δ)/ε) users are needed for (ε, δ)-DP algorithms and O_ε(d) users for ε-DP in the local model, with nearly-matching lower bounds.
Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an $(ε, δ)$-DP algorithm using only $O(\log(1/δ)/ε)$ users. For $ε$-DP algorithms, we show that we can learn using only $O_ε(d)$ users even in the local model, where $d$ is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required. A crucial component of our results is a generalization of global stability [Bun et al., FOCS 2020] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.