Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
This work addresses privacy concerns in distributed data collection for machine learning by providing a method that balances privacy and accuracy, though it is incremental in improving upon existing shuffled DP techniques.
The paper tackles the problem of learning from private data distributed across end users by introducing a shuffled differential privacy protocol for estimating kernel density functions, achieving accuracy comparable to central DP. It demonstrates this approach for private classifier learning, with experiments showing favorable downstream performance and practical trade-offs.
We study a setting of collecting and learning from private data distributed across end users. In the shuffled model of differential privacy, the end users partially protect their data locally before sharing it, and their data is also anonymized during its collection to enhance privacy. This model has recently become a prominent alternative to central DP, which requires full trust in a central data curator, and local DP, where fully local data protection takes a steep toll on downstream accuracy. Our main technical result is a shuffled DP protocol for privately estimating the kernel density function of a distributed dataset, with accuracy essentially matching central DP. We use it to privately learn a classifier from the end user data, by learning a private density function per class. Moreover, we show that the density function itself can recover the semantic content of its class, despite having been learned in the absence of any unprotected data. Our experiments show the favorable downstream performance of our approach, and highlight key downstream considerations and trade-offs in a practical ML deployment of shuffled DP.