MELGMLMay 6

PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data

arXiv:2605.048387.0h-index: 41
Predicted impact top 82% in ME · last 90 daysOriginality Highly original
AI Analysis

For researchers performing causal discovery from incomplete data, PAIR-CI solves the miscalibration problem of standard impute-then-test approaches, which is a known bottleneck in the field.

PAIR-CI is a nonparametric conditional independence test that restores calibration in causal discovery with incomplete data by integrating multiple imputation via a paired permutation design, reducing false positive rates from 28-45% to below 5% under MNAR and improving structural recovery by up to 44% on large graphs.

The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compares cross-validated models that include and exclude the candidate variable while receiving the same imputed conditioning set, forcing imputation error to cancel in their loss difference rather than contaminate the test statistic. A provably consistent variance estimator jointly accounts for uncertainty arising from cross-validation and multiple imputation -- to our knowledge, the first formal unification of these two inferential frameworks. In simulations, existing imputation-based CI tests exhibit false positive rates of 28--45% when data are missing not at random (MNAR), whereas PAIR-CI averages below the nominal 5% level across data-generating processes and missingness mechanisms. These gains are largest in nonlinear settings and grow with causal graph size: when integrated into the PC algorithm, PAIR-CI reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable equivalents, and up to 44% on the 56-variable HAILFINDER network, with stable performance in all settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes