Differentially Private Maximal Information Coefficients
This work addresses privacy concerns for statisticians and data analysts using MIC on sensitive datasets, though it is incremental as it builds on existing differential privacy methods.
The authors tackled the problem of computing the Maximal Information Coefficient (MIC) on sensitive data without leaking private information by introducing differentially private algorithms, specifically the MICr statistic, which outperforms the Laplace mechanism and provides usable accuracy on moderately large datasets.
The Maximal Information Coefficient (MIC) is a powerful statistic to identify dependencies between variables. However, it may be applied to sensitive data, and publishing it could leak private information. As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. We show that the natural application of the classic Laplace mechanism yields insufficient accuracy. We therefore introduce the MICr statistic, which is a new MIC approximation that is more compatible with differential privacy. We prove MICr is a consistent estimator for MIC, and we provide two differentially private versions of it. We perform experiments on a variety of real and synthetic datasets. The results show that the private MICr statistics significantly outperform direct application of the Laplace mechanism. Moreover, experiments on real-world datasets show accuracy that is usable when the sample size is at least moderately large.