LG MLSep 1, 2025

Effects of Distributional Biases on Gradient-Based Causal Discovery in the Bivariate Categorical Case

Tim Schwabe, Moritz Lange, Laurenz Wiskott, Maribel Acosta

arXiv:2509.01621v14.1h-index: 4

Originality Incremental advance

AI Analysis

This addresses reliability issues in causal discovery methods for researchers, but it is incremental as it focuses on specific biases in a controlled setup.

The paper tackled the problem of distributional biases affecting gradient-based causal discovery in bivariate categorical data, showing that marginal distribution asymmetry and shift asymmetry can skew causal learning, and demonstrated that eliminating competition between factorizations can control these biases.

Gradient-based causal discovery shows great potential for deducing causal structure from data in an efficient and scalable way. Those approaches however can be susceptible to distributional biases in the data they are trained on. We identify two such biases: Marginal Distribution Asymmetry, where differences in entropy skew causal learning toward certain factorizations, and Marginal Distribution Shift Asymmetry, where repeated interventions cause faster shifts in some variables than in others. For the bivariate categorical setup with Dirichlet priors, we illustrate how these biases can occur even in controlled synthetic data. To examine their impact on gradient-based methods, we employ two simple models that derive causal factorizations by learning marginal or conditional data distributions - a common strategy in gradient-based causal discovery. We demonstrate how these models can be susceptible to both biases. We additionally show how the biases can be controlled. An empirical evaluation of two related, existing approaches indicates that eliminating competition between possible causal factorizations can make models robust to the presented biases.

View on arXiv PDF

Similar