ML IT LGJul 29, 2020

Information-Theoretic Approximation to Causal Models

arXiv:2007.15047v21 citations

AI Analysis

This work addresses causal inference for researchers in statistics and machine learning, but it is incremental as it builds on existing invariance principles and shows limited performance improvements.

The paper tackles the problem of inferring causal direction and effect between two discrete variables using observational and interventional data, by embedding distributions into a higher-dimensional space to approximate causal models via linear optimization, but experimental results show it lags behind state-of-the-art methods on synthetic additive noise data while being competitive in some cases with multiplicative noise and real-world data.

Inferring the causal direction and causal effect between two discrete random variables X and Y from a finite sample is often a crucial problem and a challenging task. However, if we have access to observational and interventional data, it is possible to solve that task. If X is causing Y, then it does not matter if we observe an effect in Y by observing changes in X or by intervening actively on X. This invariance principle creates a link between observational and interventional distributions in a higher dimensional probability space. We embed distributions that originate from samples of X and Y into that higher dimensional space such that the embedded distribution is closest to the distributions that follow the invariance principle, with respect to the relative entropy. This allows us to calculate the best information-theoretic approximation for a given empirical distribution, that follows an assumed underlying causal model. We show that this information-theoretic approximation to causal models (IACM) can be done by solving a linear optimization problem. In particular, by approximating the empirical distribution to a monotonic causal model, we can calculate probabilities of causation. We can also use IACM for causal discovery problems in the bivariate, discrete case. However, experimental results on labeled synthetic data from additive noise models show that our causal discovery approach is lagging behind state-of-the-art approaches because the invariance principle encodes only a necessary condition for causal relations. Nevertheless, for synthetic multiplicative noise data and real-world data, our approach can compete in some cases with alternative methods.

View on arXiv PDF

Similar