ML CR LG MEAug 2, 2023

Improving the Variance of Differentially Private Randomized Experiments through Clustering

Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie

arXiv:2308.00957v34.31 citationsh-index: 62

Originality Incremental advance

AI Analysis

This work addresses the privacy-variance trade-off in differentially private causal inference, which is important for researchers and practitioners conducting sensitive randomized experiments, though it appears incremental as it builds on existing differential privacy mechanisms.

The paper tackles the problem of increased variance in causal effect estimation when using differential privacy in randomized experiments, proposing a new mechanism called 'Cluster-DP' that leverages data clustering to improve the privacy-variance trade-off. The results show that selecting higher-quality clusters can decrease the variance penalty without compromising privacy, with empirical evaluations on real and simulated data demonstrating performance gains over baselines.

Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithms privacy guarantees, can encourage participants to share their responses without the risk of de-anonymization. However, many mechanisms achieve differential privacy by adding noise to the original dataset, which reduces the precision of causal effect estimation. This introduces a fundamental trade-off between privacy and variance when performing causal analyses on differentially private data. In this work, we propose a new differentially private mechanism, "Cluster-DP", which leverages a given cluster structure in the data to improve the privacy-variance trade-off. While our results apply to any clustering, we demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees. Finally, we evaluate the theoretical and empirical performance of our Cluster-DP algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniform-prior version.

View on arXiv PDF

Similar