Subset-Based Instance Optimality in Private Estimation
This work addresses the problem of improving privacy guarantees in data estimation for researchers and practitioners in differential privacy, though it is incremental by building on prior benchmarks.
The authors introduced a stronger definition of instance optimality for differentially private estimation, requiring algorithms to compete with benchmarks that know the dataset in advance and perform well on large subsets, and they constructed private algorithms achieving this for properties like means and quantiles, with their mean estimator matching or exceeding existing methods asymptotically.
We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.