Differentially Private Histograms in the Shuffle Model from Fake Users
This work addresses resource consumption issues in privacy-preserving data analysis for applications like secure data aggregation, though it is incremental as it builds on existing shuffle model techniques.
The paper tackles the problem of high message complexity in differentially private histogram protocols within the shuffle model by introducing a protocol that reduces message complexity to two per user when there are enough users, using fake rows and randomization, and shows small error in analysis and experiments.
There has been much recent work in the shuffle model of differential privacy, particularly for approximate $d$-bin histograms. While these protocols achieve low error, the number of messages sent by each user -- the message complexity -- has so far scaled with $d$ or the privacy parameters. The message complexity is an informative predictor of a shuffle protocol's resource consumption. We present a protocol whose message complexity is two when there are sufficiently many users. The protocol essentially pairs each row in the dataset with a fake row and performs a simple randomization on all rows. We show that the error introduced by the protocol is small, using rigorous analysis as well as experiments on real-world data. We also prove that corrupt users have a relatively low impact on our protocol's estimates.