On Differentially Private U Statistics
This work addresses the gap in differentially private estimation for U-statistics, which is important for statisticians and data analysts handling sensitive data in applications like testing and network analysis, representing a novel method for a known bottleneck rather than an incremental improvement.
The paper tackles the problem of privately estimating parameters using U-statistics, which are common in nonparametric tests and network analysis, by showing that existing private mean estimation methods lead to suboptimal error rates and proposing a new thresholding-based approach with local Hájek projections to achieve nearly optimal private error for non-degenerate cases and strong indications for degenerate ones.
We consider the problem of privately estimating a parameter $\mathbb{E}[h(X_1,\dots,X_k)]$, where $X_1$, $X_2$, $\dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $Θ(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach using \emph{local Hájek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.