Better Private Distribution Testing by Leveraging Unverified Auxiliary Data
This addresses the challenge of performing hypothesis testing on sensitive data while using potentially untrusted prior knowledge, which is incremental as it extends an existing framework to the private setting.
The paper tackles the problem of differentially private distribution testing by leveraging auxiliary data, designing algorithms for uniformity, identity, and closeness testing with sample complexity that scales with the quality of the auxiliary information, and proves these are optimal up to logarithmic factors.
We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).