Linear Queries Estimation with Local Differential Privacy
This work addresses high-dimensional data analysis with privacy constraints, offering improved accuracy for tasks like distribution estimation, though it is incremental in building on prior methods.
The paper tackles the problem of estimating linear queries under local differential privacy, providing algorithms for both offline and adaptive settings with tight error bounds; for offline, it achieves L2 error independent of dimension d in certain regimes, and for adaptive, it achieves optimal L∞ error matching known lower bounds.
We study the problem of estimating a set of $d$ linear queries with respect to some unknown distribution $\mathbf{p}$ over a domain $\mathcal{J}=[J]$ based on a sensitive data set of $n$ individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and $d$-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n\lesssim d^2/\log(J)$, our algorithms attain $L_2$ estimation error that is independent of $d$, and is tight up to a factor of $\tilde{O}\left(\log^{1/4}(J)\right)$. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n\lesssim J^2/\log(J)$. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over $d$ rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{\infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{\infty}$ error for the offline version of this problem [Duchi et al. 2013].