Asymptotically Optimal Locally Private Heavy Hitters via Parameterized Sketches
This work addresses privacy-preserving data analysis for large-scale user data, offering incremental improvements in error bounds and efficiency for local differential privacy algorithms.
The paper tackles the problem of frequency estimation and heavy hitters identification under local differential privacy, presenting algorithms that achieve optimal worst-case estimation error for the frequency oracle and heavy hitters when failure probability is at least inverse polynomial in the number of users, with server running time of ~O(n) and user running time of ~O(1). The frequency-oracle algorithm reduces estimation error compared to prior work, and the heavy hitters method improves worst-case error by a factor of Ω(√log n) while being easily implementable.
We present two new local differentially private algorithms for frequency estimation. One solves the fundamental frequency oracle problem; the other solves the well-known heavy hitters identification problem. Consistent with prior art, these are randomized algorithms. As a function of failure probability~$β$, the former achieves optimal worst-case estimation error for every~$β$, while the latter is optimal when~$β$ is at least inverse polynomial in~$n$, the number of users. In both algorithms, server running time is~$\tilde{O}(n)$ while user running time is~$\tilde{O}(1)$. Our frequency-oracle algorithm achieves lower estimation error than the prior works of Bassily et al. (NeurIPS 2017). On the other hand, our heavy hitters identification method is as easily implementable as as TreeHist (Bassily et al., 2017) and has superior worst-case error, by a factor of $Ω(\sqrt{\log n})$.