LGCRDSMar 18, 2025

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data

arXiv:2503.14709v13 citationsh-index: 48COLT
Originality Incremental advance
AI Analysis

This addresses the challenge of performing hypothesis testing on sensitive data while using potentially untrusted prior knowledge, which is incremental as it extends an existing framework to the private setting.

The paper tackles the problem of differentially private distribution testing by leveraging auxiliary data, designing algorithms for uniformity, identity, and closeness testing with sample complexity that scales with the quality of the auxiliary information, and proves these are optimal up to logarithmic factors.

We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes