CR DSJun 8, 2021

Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead

Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh

arXiv:2106.04247v125.460 citations

Originality Highly original

AI Analysis

This work addresses the trade-off between privacy and efficiency in machine learning data aggregation, offering a practical middle ground between central and local DP models.

The paper tackles the problem of achieving high accuracy with low communication overhead in the shuffled differential privacy model for binary summation and histogram aggregation, resulting in algorithms that approach central DP accuracy with communication costs nearly matching non-private baselines.

Differential privacy (DP) is a formal notion for quantifying the privacy loss of algorithms. Algorithms in the central model of DP achieve high accuracy but make the strongest trust assumptions whereas those in the local DP model make the weakest trust assumptions but incur substantial accuracy loss. The shuffled DP model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019) has recently emerged as a feasible middle ground between the central and local models, providing stronger trust assumptions than the former while promising higher accuracies than the latter. In this paper, we obtain practical communication-efficient algorithms in the shuffled DP model for two basic aggregation primitives used in machine learning: 1) binary summation, and 2) histograms over a moderate number of buckets. Our algorithms achieve accuracy that is arbitrarily close to that of central DP algorithms with an expected communication per user essentially matching what is needed without any privacy constraints! We demonstrate the practicality of our algorithms by experimentally comparing their performance to several widely-used protocols such as Randomized Response (Warner, 1965) and RAPPOR (Erlingsson et al., 2014).

View on arXiv PDF

Similar