CURATE: Scaling-up Differentially Private Causal Graph Discovery
This work addresses privacy concerns in causal graph discovery for sensitive observational data, offering an incremental improvement over existing differentially private methods by optimizing privacy budget allocation.
The paper tackles the problem of scaling up differentially private causal graph discovery by introducing CURATE, a framework with adaptive privacy budgeting that minimizes error probability for constraint-based algorithms and maximizes optimization iterations for score-based algorithms while bounding cumulative leakage. The result shows that CURATE achieves higher utility with less privacy leakage compared to existing DP-CGD algorithms, as validated through experiments on multiple datasets.
Causal Graph Discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents joint distribution of features of a dataset. CGD-algorithms are broadly classified into two categories: (i) Constraint-based algorithms (outcome depends on conditional independence (CI) tests), (ii) Score-based algorithms (outcome depends on optimized score-function). Since, sensitive features of observational data is prone to privacy-leakage, Differential Privacy (DP) has been adopted to ensure user privacy in CGD. Adding same amount of noise in this sequential-natured estimation process affects the predictive performance of the algorithms. As initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial, they need to be more accurate, less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows adaptive privacy budgeting by minimizing error probability (for constraint-based), maximizing iterations of the optimization problem (for score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy-leakage.