Perturb-and-Project: Differentially Private Similarities and Marginals
This work addresses privacy-preserving data analysis for datasets with sparse features, offering incremental improvements over prior methods that were limited to even k.
The paper tackles the problem of releasing differentially private pairwise cosine similarities and k-way marginal queries, achieving novel efficient algorithms with stronger guarantees for t-sparse datasets when t ≤ n^(5/6)/log n.
We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first design novel efficient algorithms to privately release pair-wise cosine similarities. Second, we derive a novel algorithm to compute $k$-way marginal queries over $n$ features. Prior work could achieve comparable guarantees only for $k$ even. Furthermore, we extend our results to $t$-sparse datasets, where our efficient algorithms yields novel, stronger guarantees whenever $t\le n^{5/6}/\log n\,.$ Finally, we provide a theoretical perspective on why \textit{fast} input perturbation algorithms works well in practice. The key technical ingredients behind our results are tight sum-of-squares certificates upper bounding the Gaussian complexity of sets of solutions.