Dominik Janzing

ML
h-index27
62papers
5,324citations
Novelty52%
AI Score59

62 Papers

MEJun 14, 2022Code
DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models

Patrick Blöbaum, Peter Götz, Kailash Budhathoki et al.

We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries -- all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm.

LGMar 8, 2022
Score matching enables causal discovery of nonlinear additive noise models

Paul Rolland, Volkan Cevher, Matthäus Kleindessner et al.

This paper demonstrates how to recover causal graphs from the score of the data distribution in non-linear additive (Gaussian) noise models. Using score matching algorithms as a building block, we show how to design a new generation of scalable causal discovery methods. To showcase our approach, we also propose a new efficient method for approximating the score's Jacobian, enabling to recover the causal graph. Empirically, we find that the new algorithm, called SCORE, is competitive with state-of-the-art causal discovery methods while being significantly faster.

48.1AIMay 29
Evaluating Bivariate Causal Statements Based on Mutual Compatibility

Erik Jahn, Dominik Janzing

For many real-world systems, causal ground truth is difficult to obtain, making claims about causal effects hard to assess. We develop methods for evaluating collections of $\binom{n}{2}$ bivariate causal statements over a set of $n$ variables. In the setting of acyclic linear statements, any such collection can be extended to a unique multivariate causal model, but we argue that this induced model is implausible if it imposes substantial additional confounding to explain observed correlations. We introduce a compatibility score that quantifies this notion of plausibility, notably without relying on the faithfulness assumption. Additionally, we define an incompatibility score for purely graphical bivariate causal statements, based on global consistency constraints that are derived from acyclicity and faithfulness assumptions. We give theoretical and empirical evidence that both scores can successfully distinguish correct from incorrect causal statements in generic settings. Moreover, we demonstrate the practical applicability of our methods by analyzing causal claims made by large language models. Our work aims to provide a foundation for assessing the reliability of causal information derived from human experts or artificial intelligence in settings where alternative forms of validation are unavailable.

23.8AIMay 29
Formalizing and falsifying causal pathways of rare events

Anahita Haghighat, Dominik Janzing

Building on recent formalizations of root cause analysis for rare events (``outliers'') in structural equation models, we propose a formal definition of a causal pathway and discuss its testable implications. We identify conditions under which these implications depend only on a causal abstraction defined by the pathway of rare events, rather than on the full causal graph of the underlying system. Accordingly, we introduce an abstraction of causal structure to pathways of rare events that bridges simple verbal causal explanations and detailed causal modeling.

LGJul 18, 2023
Self-Compatibility: Evaluating Causal Discovery without Ground Truth

Philipp M. Faller, Leena Chennuru Vankadara, Atalanti A. Mastakouri et al.

As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection.

MENov 15, 2022
Phenomenological Causality

Dominik Janzing, Sergio Hernan Garrido Mejia

Discussions on causal relations in real life often consider variables for which the definition of causality is unclear since the notion of interventions on the respective variables is obscure. Asking 'what qualifies an action for being an intervention on the variable X' raises the question whether the action impacted all other variables only through X or directly, which implicitly refers to a causal model. To avoid this known circularity, we instead suggest a notion of 'phenomenological causality' whose basic concept is a set of elementary actions. Then the causal structure is defined such that elementary actions change only the causal mechanism at one node (e.g. one of the causal conditionals in the Markov factorization). This way, the Principle of Independent Mechanisms becomes the defining property of causal structure in domains where causality is a more abstract phenomenon rather than being an objective fact relying on hard-wired causal links between tangible objects. We describe this phenomenological approach to causality for toy and hypothetical real-world examples and argue that it is consistent with the causal Markov condition when the system under consideration interacts with other variables that control the elementary actions.

MEOct 20, 2023
Assumption violations in causal discovery and the robustness of score matching

Francesco Montagna, Atalanti A. Mastakouri, Elias Eulig et al.

When domain knowledge is limited and experimentation is restricted by ethical, financial, or time constraints, practitioners turn to observational causal discovery methods to recover the causal structure, exploiting the statistical properties of their data. Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational i.i.d. data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach. Our experimental findings show that score matching-based methods demonstrate surprising performance in the false positive and false negative rate of the inferred graph in these challenging scenarios, and we provide theoretical insights into their performance. This work is also the first effort to benchmark the stability of causal discovery algorithms with respect to the values of their hyperparameters. Finally, we hope this paper will set a new standard for the evaluation of causal discovery methods and can serve as an accessible entry point for practitioners interested in the field, highlighting the empirical implications of different algorithm choices.

AIApr 23, 2023
Meaningful Causal Aggregation and Paradoxical Confounding

Yuchen Zhu, Kailash Budhathoki, Jonas Kuebler et al.

In aggregated variables the impact of interventions is typically ill-defined because different micro-realizations of the same macro-intervention can result in different changes of downstream macro-variables. We show that this ill-definedness of causality on aggregated variables can turn unconfounded causal relations into confounded ones and vice versa, depending on the respective micro-realization. We argue that it is practically infeasible to only use aggregated causal systems when we are free from this ill-definedness. Instead, we need to accept that macro causal relations are typically defined only with reference to the micro states. On the positive side, we show that cause-effect relations can be aggregated when the macro interventions are such that the distribution of micro states is the same as in the observational distribution; we term this natural macro interventions. We also discuss generalizations of this observation.

LGJun 26, 2022
Explaining the root causes of unit-level changes

Kailash Budhathoki, George Michailidis, Dominik Janzing

Existing methods of explainable AI and interpretable ML cannot explain change in the values of an output variable for a statistical unit in terms of the change in the input values and the change in the "mechanism" (the function transforming input to output). We propose two methods based on counterfactuals for explaining unit-level changes at various input granularities using the concept of Shapley values from game theory. These methods satisfy two key axioms desirable for any unit-level change attribution method. Through simulations, we study the reliability and the scalability of the proposed methods. We get sensible results from a case study on identifying the drivers of the change in the earnings for individuals in the US.

59.7AIApr 6
IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Ivaxi Sheth, Zhijing Jin, Bryan Wilder et al.

In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable. Identifying valid instruments requires interdisciplinary knowledge, creativity, and contextual understanding, making it a non-trivial task. In this paper, we investigate whether large language models (LLMs) can aid in this task. We perform a two-stage evaluation framework. First, we test whether LLMs can recover well-established instruments from the literature, assessing their ability to replicate standard reasoning. Second, we evaluate whether LLMs can identify and avoid instruments that have been empirically or theoretically discredited. Building on these results, we introduce IV Co-Scientist, a multi-agent system that proposes, critiques, and refines IVs for a given treatment-outcome pair. We also introduce a statistical test to contextualize consistency in the absence of ground truth. Our results show the potential of LLMs to discover valid instrumental variables from a large observational database.

AIJul 1, 2020Code
Quantifying intrinsic causal contributions via structure preserving interventions

Dominik Janzing, Patrick Blöbaum, Atalanti A. Mastakouri et al.

We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy.

LGFeb 12, 2025
On Different Notions of Redundancy in Conditional-Independence-Based Discovery of Graphical Models

Philipp M. Faller, Dominik Janzing

Conditional-independence-based discovery uses statistical tests to identify a graphical model that represents the independence structure of variables in a dataset. These tests, however, can be unreliable, and algorithms are sensitive to errors and violated assumptions. Often, there are tests that were not used in the construction of the graph. In this work, we show that these redundant tests have the potential to detect or sometimes correct errors in the learned model. But we further show that not all tests contain this additional information and that such redundant tests have to be applied with care. Precisely, we argue that the conditional (in)dependence statements that hold for every probability distribution are unlikely to detect and correct errors - in contrast to those that follow only from graphical assumptions.

MLNov 8, 2024
Cross-validating causal discovery via Leave-One-Variable-Out

Daniela Schkoda, Philipp Faller, Patrick Blöbaum et al.

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the "Leave-One-Variable-Out (LOVO)" prediction where $Y$ is inferred from $X$ without any joint observations of $X$ and $Y$, given only training data from $X,Z_1,\dots,Z_k$ and from $Z_1,\dots,Z_k,Y$. We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between $X$ and $Y$, enabling this type of prediction. The prediction error can then be estimated since the joint distribution $P(X, Y)$ is assumed to be available, and $X$ and $Y$ have only been omitted for the purpose of falsification. After presenting this graphical method, which is applicable to general causal discovery algorithms, we illustrate how to construct a LOVO predictor tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.

MEApr 12, 2024
Multiply-Robust Causal Change Attribution

Victor Quintas-Martinez, Mohammad Taha Bahadori, Eduardo Santiago et al.

Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application. Our method is implemented as part of the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).

LGFeb 12, 2025
Toward Universal Laws of Outlier Propagation

Aram Ebtekar, Yuhao Wang, Dominik Janzing

When a variety of anomalous features motivate flagging different samples as outliers, Algorithmic Information Theory (AIT) offers a principled way to unify them in terms of a sample's randomness deficiency. Subject to the algorithmic Markov condition on a causal Bayesian network, we show that the randomness deficiency of a joint sample decomposes into a sum of randomness deficiencies at each causal mechanism. Consequently, anomalous observations can be attributed to their root causes, i.e., the mechanisms that behaved anomalously. As an extension of Levin's law of randomness conservation, we show that weak outliers cannot cause strong ones. We show how these information theoretic laws clarify our understanding of outlier detection and attribution, in the context of more specialized outlier scores from prior literature.

LGJan 14, 2025
Causal vs. Anticausal merging of predictors

Sergio Hernan Garrido Mejia, Patrick Blöbaum, Bernhard Schölkopf et al.

We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors. We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect. We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction. Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.

LGOct 16, 2025
From Guess2Graph: When and How Can Unreliable Experts Safely Boost Causal Discovery in Finite Samples?

Sujai Hiremath, Dominik Janzing, Philipp Faller et al.

Causal discovery algorithms often perform poorly with limited samples. While integrating expert knowledge (including from LLMs) as constraints promises to improve performance, guarantees for existing methods require perfect predictions or uncertainty estimates, making them unreliable for practical use. We propose the Guess2Graph (G2G) framework, which uses expert guesses to guide the sequence of statistical tests rather than replacing them. This maintains statistical consistency while enabling performance improvements. We develop two instantiations of G2G: PC-Guess, which augments the PC algorithm, and gPC-Guess, a learning-augmented variant designed to better leverage high-quality expert input. Theoretically, both preserve correctness regardless of expert error, with gPC-Guess provably outperforming its non-augmented counterpart in finite samples when experts are "better than random." Empirically, both show monotonic improvement with expert accuracy, with gPC-Guess achieving significantly stronger gains.

MLOct 8, 2025
Root Cause Analysis of Outliers in Unknown Cyclic Graphs

Daniela Schkoda, Dominik Janzing

We study the propagation of outliers in cyclic causal graphs with linear structural equations, tracing them back to one or several "root cause" nodes. We show that it is possible to identify a short list of potential root causes provided that the perturbation is sufficiently strong and propagates according to the same structural equations as in the normal mode. This shortlist consists of the true root causes together with those of its parents lying on a cycle with the root cause. Notably, our method does not require prior knowledge of the causal graph.

MLJun 7, 2024
Root Cause Analysis of Outliers with Missing Structural Knowledge

William Roy Orchard, Nastaran Okati, Sergio Hernan Garrido Mejia et al.

The goal of Root Cause Analysis (RCA) is to explain why an anomaly occurred by identifying where the fault originated. Several recent works model the anomalous event as resulting from a change in the causal mechanism at the root cause, i.e., as a soft intervention. RCA is then the task of identifying which causal mechanism changed. In real-world applications, one often has either few or only a single sample from the post-intervention distribution: a severe limitation for most methods, which assume one knows or can estimate the distribution. However, even those that do not are statistically ill-posed due to the need to probe regression models in regions of low probability density. In this paper, we propose simple, efficient methods to overcome both difficulties in the case where there is a single root cause and the causal graph is a polytree. When one knows the causal graph, we give guarantees for a traversal algorithm that requires only marginal anomaly scores and does not depend on specifying an arbitrary anomaly score cut-off. When one does not know the causal graph, we show that the heuristic of identifying root causes as the variables with the highest marginal anomaly scores is causally justified. To this end, we prove that anomalies with small scores are unlikely to cause those with larger scores in polytrees and give upper bounds for the likelihood of causal pathways with non-monotonic anomaly scores.

MLMay 16, 2023
Toward Falsifying Causal Graphs Using a Permutation-Based Test

Elias Eulig, Atalanti A. Mastakouri, Patrick Blöbaum et al.

Understanding causal relationships among the variables of a system is paramount to explain and control its behavior. For many real-world systems, however, the true causal graph is not readily available and one must resort to predictions made by algorithms or domain experts. Therefore, metrics that quantitatively assess the goodness of a causal graph provide helpful checks before using it in downstream tasks. Existing metrics provide an $\textit{absolute}$ number of inconsistencies between the graph and the observed data, and without a baseline, practitioners are left to answer the hard question of how many such inconsistencies are acceptable or expected. Here, we propose a novel consistency metric by constructing a baseline through node permutations. By comparing the number of inconsistencies with those on the baseline, we derive an interpretable metric that captures whether the graph is significantly better than random. Evaluating on both simulated and real data sets from various domains, including biology and cloud monitoring, we demonstrate that the true graph is not falsified by our metric, whereas the wrong graphs given by a hypothetical user are likely to be falsified.

MLMay 11, 2023
Reinterpreting causal discovery as the task of predicting unobserved joint statistics

Dominik Janzing, Philipp M. Faller, Leena Chennuru Vankadara

If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal discovery can help inferring properties of the `unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences (as in `integrative causal inference') or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal discovery becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is `true' a causal hypothesis is {\it useful} whenever it correctly predicts statistical properties of unobserved joint distributions. This way, a sparse causal graph that omits weak influences may be more useful than a dense one (despite being less accurate) because it is able to reconstruct the full joint distribution from marginal distributions of smaller subsets. Within such a `pragmatic' application of causal discovery, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations.

LGMay 10, 2023
Causal Information Splitting: Engineering Proxy Features for Robustness to Distribution Shifts

Bijan Mazaheri, Atalanti Mastakouri, Dominik Janzing et al.

Statistical prediction models are often trained on data from different probability distributions than their eventual use cases. One approach to proactively prepare for these shifts harnesses the intuition that causal mechanisms should remain invariant between environments. Here we focus on a challenging setting in which the causal and anticausal variables of the target are unobserved. Leaning on information theory, we develop feature selection and engineering techniques for the observed downstream variables that act as proxies. We identify proxies that help to build stable models and moreover utilize auxiliary training tasks to answer counterfactual questions that extract stability-enhancing information from proxies. We demonstrate the effectiveness of our techniques on synthetic and real data.

MEFeb 23, 2022
Testing Granger Non-Causality in Panels with Cross-Sectional Dependencies

Lenon Minorics, Caner Turkmen, David Kernert et al.

This paper proposes a new approach for testing Granger non-causality on panel data. Instead of aggregating panel member statistics, we aggregate their corresponding p-values and show that the resulting p-value approximately bounds the type I error by the chosen significance level even if the panel members are dependent. We compare our approach against the most widely used Granger causality algorithm on panel data and show that our approach yields lower FDR at the same power for large sample sizes and panels with cross-sectional dependencies. Finally, we examine COVID-19 data about confirmed cases and deaths measured in countries/regions worldwide and show that our approach is able to discover the true causal relation between confirmed cases and deaths while state-of-the-art approaches fail.

MLFeb 4, 2022
Correcting Confounding via Random Selection of Background Variables

You-Lin Chen, Lenon Minorics, Dominik Janzing

We propose a method to distinguish causal influence from hidden confounding in the following scenario: given a target variable Y, potential causal drivers X, and a large number of background features, we propose a novel criterion for identifying causal relationship based on the stability of regression coefficients of X on Y with respect to selecting different background features. To this end, we propose a statistic V measuring the coefficient's variability. We prove, subject to a symmetry assumption for the background influence, that V converges to zero if and only if X contains no causal drivers. In experiments with simulated data, the method outperforms state of the art algorithms. Further, we report encouraging results for real-world data. Our approach aligns with the general belief that causal insights admit better generalization of statistical associations across environments, and justifies similar existing heuristic approaches from the literature.

AIFeb 2, 2022
Causal Inference Through the Structural Causal Marginal Problem

Luigi Gresele, Julius von Kügelgen, Jonas M. Kübler et al.

We introduce an approach to counterfactual inference based on merging information from multiple datasets. We consider a causal reformulation of the statistical marginal problem: given a collection of marginal structural causal models (SCMs) over distinct but overlapping sets of variables, determine the set of joint SCMs that are counterfactually consistent with the marginal ones. We formalise this approach for categorical SCMs using the response function formulation and show that it reduces the space of allowed marginal and joint SCMs. Our work thus highlights a new mode of falsifiability through additional variables, in contrast to the statistical one via additional data.

MLNov 18, 2021
Causal Forecasting:Generalization Bounds for Autoregressive Models

Leena Chennuru Vankadara, Philipp Michael Faller, Michaela Hardt et al.

Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its \textit{causal risk}. Here, we study the problem of \textit{causal generalization} -- generalizing from the observational to interventional distributions -- in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of \textit{causal learning theory} for forecasting. Using this framework, we obtain a characterization of the difference between statistical and causal risks, which helps identify sources of divergence between them. Under causal sufficiency, the problem of causal generalization amounts to learning under covariate shifts, albeit with additional structure (restriction to interventional distributions under the VAR model). This structure allows us to obtain uniform convergence bounds on causal generalizability for the class of VAR models. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization in the time-series setting.

MEOct 29, 2021
Cause-effect inference through spectral independence in linear dynamical systems: theoretical foundations

Michel Besserve, Naji Shajarisales, Dominik Janzing et al.

Distinguishing between cause and effect using time series observational data is a major challenge in many scientific fields. A new perspective has been provided based on the principle of Independence of Causal Mechanisms (ICM), leading to the Spectral Independence Criterion (SIC), postulating that the power spectral density (PSD) of the cause time series is uncorrelated with the squared modulus of the frequency response of the filter generating the effect. Since SIC rests on methods and assumptions in stark contrast with most causal discovery methods for time series, it raises questions regarding what theoretical grounds justify its use. In this paper, we provide answers covering several key aspects. After providing an information theoretic interpretation of SIC, we present an identifiability result that sheds light on the context for which this approach is expected to perform well. We further demonstrate the robustness of SIC to downsampling - an obstacle that can spoil Granger-based inference. Finally, an invariance perspective allows to explore the limitations of the spectral independence assumption and how to generalize it. Overall, these results support the postulate of Spectral Independence is a well grounded leading principle for causal inference based on empirical time series.

LGOct 11, 2021
You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction

Osama Makansi, Julius von Kügelgen, Francesco Locatello et al.

Predicting the future trajectory of a moving agent can be easy when the past trajectory continues smoothly but is challenging when complex interactions with other agents are involved. Recent deep learning approaches for trajectory prediction show promising performance and partially attribute this to successful reasoning about agent-agent interactions. However, it remains unclear which features such black-box models actually learn to use for making predictions. This paper proposes a procedure that quantifies the contributions of different cues to model performance based on a variant of Shapley values. Applying this procedure to state-of-the-art trajectory prediction methods on standard benchmark datasets shows that they are, in fact, unable to reason about interactions. Instead, the past trajectory of the target is the only feature used for predicting its future. For a task with richer social interaction patterns, on the other hand, the tested models do pick up such interactions to a certain extent, as quantified by our feature attribution method. We discuss the limits of the proposed method and its links to causality

MEJul 15, 2021
Obtaining Causal Information by Merging Datasets with MAXENT

Sergio Hernan Garrido Mejia, Elke Kirschbaum, Dominik Janzing

The investigation of the question "which treatment has a causal effect on a target variable?" is of particular relevance in a large number of scientific disciplines. This challenging task becomes even more difficult if not all treatment variables were or even cannot be observed jointly with the target variable. Another similarly important and challenging task is to quantify the causal influence of a treatment on a target in the presence of confounders. In this paper, we discuss how causal knowledge can be obtained without having observed all variables jointly, but by merging the statistical information from different datasets. We show how the maximum entropy principle can be used to identify edges among random variables when assuming causal sufficiency and an extended version of faithfulness, and when only subsets of the variables have been observed jointly.

MEFeb 26, 2021
Why did the distribution change?

Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum et al.

We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact that mechanisms often change independently and sometimes only some of them change. Through simulations, we study the performance of our distribution change attribution method. We then present a real-world case study identifying the drivers of the difference in the income distribution between men and women.

MLFeb 7, 2021
Causal versions of Maximum Entropy and Principle of Insufficient Reason

Dominik Janzing

The Principle of Insufficient Reason (PIR) assigns equal probabilities to each alternative of a random experiment whenever there is no reason to prefer one over the other. The Maximum Entropy Principle (MaxEnt) generalizes PIR to the case where statistical information like expectations are given. It is known that both principles result in paradoxical probability updates for joint distributions of cause and effect. This is because constraints on the conditional P(effect|cause) result in changes of P(cause) that assign higher probability to those values of the cause that offer more options for the effect, suggesting "intentional behaviour". Earlier work therefore suggested sequentially maximizing (conditional) entropy according to the causal order, but without further justification apart from plausibility on toy examples. We justify causal modifications of PIR and MaxEnt by separating constraints into restrictions for the cause and restrictions for the mechanism that generates the effect from the cause. We further sketch why Causal PIR also entails "Information Geometric Causal Inference". We briefly discuss problems of generalizing the causal version of MaxEnt to arbitrary causal DAGs.

MEMay 18, 2020
Necessary and sufficient conditions for causal feature selection in time series with latent common causes

Atalanti A. Mastakouri, Bernhard Schölkopf, Dominik Janzing

We study the identification of direct and indirect causes on time series and provide conditions in the presence of latent variables, which we prove to be necessary and sufficient under some graph constraints. Our theoretical results and estimation algorithms require two conditional independence tests for each observed candidate time series to determine whether or not it is a cause of an observed target time series. We provide experimental results in simulations, as well as real data. Our results show that our method leads to very low false positives and relatively low false negative rates, outperforming the widely used Granger causality.

LGApr 1, 2020
A theory of independent mechanisms for extrapolation in generative models

Michel Besserve, Rémy Sun, Dominik Janzing et al.

Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the architecture of such model reflect a causal graph of the true data generating process, such that one can intervene on each node independently of the others. However, the nodes of this graph are usually unobserved, leading to overparameterization and lack of identifiability of the causal structure. We develop a theoretical framework to address this challenging situation by defining a weaker form of identifiability, based on the principle of independence of mechanisms. We demonstrate on toy examples that classical stochastic gradient descent can hinder the model's extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged.

MLDec 5, 2019
Causal structure based root cause analysis of outliers

Dominik Janzing, Kailash Budhathoki, Lenon Minorics et al.

We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.

MLOct 29, 2019
Feature relevance quantification in explainable AI: A causal problem

Dominik Janzing, Lenon Minorics, Patrick Blöbaum

We discuss promising recent contributions on quantifying feature relevance using Shapley values, where we observed some confusion on which probability distribution is the right one for dropped features. We argue that the confusion is based on not carefully distinguishing between observational and interventional conditional probabilities and try a clarification based on Pearl's seminal work on causality. We conclude that unconditional rather than conditional expectations provide the right notion of dropping features in contradiction to the theoretical justification of the software package SHAP. Parts of SHAP are unaffected because unconditional expectations (which we argue to be conceptually right) are used as approximation for the conditional ones, which encouraged others to `improve' SHAP in a way that we believe to be flawed.

MLJun 28, 2019
Causal Regularization

Dominik Janzing

I argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also yield better causal models in the infinite sample regime. I first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. Choosing the size of the penalizing term, is however challenging, because cross validation is pointless. Here it is done by first estimating the strength of confounding via a method proposed earlier, which yielded some reasonable results for simulated and real data. Further, I prove a `causal generalization bound' which states (subject to a particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class. In other words, the bound guarantees "generalization" from observational to interventional distributions, which is usually not subject of statistical learning theory (and is only possible due to the underlying symmetries of the confounder model).

MLMar 2, 2018
Detecting non-causal artifacts in multivariate linear regression models

Dominik Janzing, Bernhard Schoelkopf

We consider linear models where $d$ potential causes $X_1,...,X_d$ are correlated with one target quantity $Y$ and propose a method to infer whether the association is causal or whether it is an artifact caused by overfitting or hidden common causes. We employ the idea that in the former case the vector of regression coefficients has 'generic' orientation relative to the covariance matrix $Σ_{XX}$ of $X$. Using an ICA based model for confounding, we show that both confounding and overfitting yield regression vectors that concentrate mainly in the space of low eigenvalues of $Σ_{XX}$.

AIFeb 19, 2018
Analysis of cause-effect inference by comparing regression errors

Patrick Blöbaum, Dominik Janzing, Takashi Washio et al.

We address the problem of inferring the causal direction between two variables by comparing the least-squares errors of the predictions in both possible directions. Under the assumption of an independence between the function relating cause and effect, the conditional noise distribution, and the distribution of the cause, we show that the errors are smaller in causal direction if both variables are equally scaled and the causal relation is close to deterministic. Based on this, we provide an easily applicable algorithm that only requires a regression in both possible causal directions and a comparison of the errors. The performance of the algorithm is compared with various related causal inference methods in different artificial and real-world data sets.

MLJul 4, 2017
Causal Consistency of Structural Equation Models

Paul K. Rubenstein, Sebastian Weichwald, Stephan Bongers et al.

Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to consider, for instance, the different levels of description in the following three scenarios: (a) models with large numbers of variables versus models in which the `irrelevant' or unobservable variables have been marginalised out; (b) micro-level models versus macro-level models in which the macro-variables are aggregate features of the micro-variables; (c) dynamical time series models versus models of their stationary behaviour. Our analysis stresses the importance of well specified interventions in the causal modelling process and sheds light on the interpretation of cyclic SEMs.

MLJun 8, 2017
Avoiding Discrimination through Causal Reasoning

Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo et al.

Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively. Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from "What is the right fairness criterion?" to "What do we want to assume about the causal data generating process?" Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.

MLMay 5, 2017
Group invariance principles for causal generative models

Michel Besserve, Naji Shajarisales, Bernhard Schölkopf et al.

The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning.

MLApr 5, 2017
Detecting confounding in multivariate linear models via spectral analysis

Dominik Janzing, Bernhard Schoelkopf

We study a model where one target variable Y is correlated with a vector X:=(X_1,...,X_d) of predictor variables being potential causes of Y. We describe a method that infers to what extent the statistical dependences between X and Y are due to the influence of X on Y and to what extent due to a hidden common cause (confounder) of X and Y. The method relies on concentration of measure results for large dimensions d and an independence assumption stating that, in the absence of confounding, the vector of regression coefficients describing the influence of each X on Y typically has `generic orientation' relative to the eigenspaces of the covariance matrix of X. For the special case of a scalar confounder we show that confounding typically spoils this generic orientation in a characteristic way that can be used to quantitatively estimate the amount of confounding.

MLMay 12, 2015
Removing systematic errors for exoplanet search via latent causes

Bernhard Schölkopf, David W. Hogg, Dun Wang et al.

We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.

AIMar 4, 2015
Telling cause from effect in deterministic linear dynamical systems

Naji Shajarisales, Dominik Janzing, Bernhard Shoelkopf et al.

Inferring a cause from its effect using observed time series data is a major challenge in natural and social sciences. Assuming the effect is generated by the cause trough a linear system, we propose a new approach based on the hypothesis that nature chooses the "cause" and the "mechanism that generates the effect from the cause" independent of each other. We therefore postulate that the power spectrum of the time series being the cause is uncorrelated with the square of the transfer function of the linear filter generating the effect. While most causal discovery methods for time series mainly rely on the noise, our method relies on asymmetries of the power spectral density properties that can be exploited even in the context of deterministic systems. We describe mathematical assumptions in a deterministic model under which the causal direction is identifiable with this approach. We also discuss the method's performance under the additive noise model and its relationship to Granger causality. Experiments show encouraging results on synthetic as well as real-world data. Overall, this suggests that the postulate of Independence of Cause and Mechanism is a promising principle for causal inference on empirical time series.

LGDec 11, 2014
Distinguishing cause from effect using observational data: methods and benchmarks

Joris M. Mooij, Jonas Peters, Dominik Janzing et al.

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.

MLNov 14, 2014
Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components

Philipp Geiger, Kun Zhang, Mingming Gong et al.

A widely applied approach to causal inference from a non-experimental time series $X$, often referred to as "(linear) Granger causal analysis", is to regress present on past and interpret the regression matrix $\hat{B}$ causally. However, if there is an unmeasured time series $Z$ that influences $X$, then this approach can lead to wrong causal conclusions, i.e., distinct from those one would draw if one had additional information such as $Z$. In this paper we take a different approach: We assume that $X$ together with some hidden $Z$ forms a first order vector autoregressive (VAR) process with transition matrix $A$, and argue why it is more valid to interpret $A$ causally instead of $\hat{B}$. Then we examine under which conditions the most important parts of $A$ are identifiable or almost identifiable from only $X$. Essentially, sufficient conditions are (1) non-Gaussian, independent noise or (2) no influence from $X$ to $Z$. We present two estimation algorithms that are tailored towards conditions (1) and (2), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using $X$.

AIAug 9, 2014
From Ordinary Differential Equations to Structural Causal Models: the deterministic case

Joris Mooij, Dominik Janzing, Bernhard Schoelkopf

We show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models.

QUANT-PHJun 19, 2014
Inferring causal structure: a quantum advantage

Katja Ried, Megan Agnew, Lydia Vermeyden et al.

The problem of using observed correlations to infer causal relations is relevant to a wide variety of scientific disciplines. Yet given correlations between just two classical variables, it is impossible to determine whether they arose from a causal influence of one on the other or a common cause influencing both, unless one can implement a randomized intervention. We here consider the problem of causal inference for quantum variables. We introduce causal tomography, which unifies and generalizes conventional quantum tomography schemes to provide a complete solution to the causal inference problem using a quantum analogue of a randomized trial. We furthermore show that, in contrast to the classical case, observed quantum correlations alone can sometimes provide a solution. We implement a quantum-optical experiment that allows us to control the causal relation between two optical modes, and two measurement schemes -- one with and one without randomization -- that extract this relation from the observed correlations. Our results show that entanglement and coherence, known to be central to quantum information processing, also provide a quantum advantage for causal inference.

MLFeb 11, 2014
Justifying Information-Geometric Causal Inference

Dominik Janzing, Bastian Steudel, Naji Shajarisales et al.

Information Geometric Causal Inference (IGCI) is a new approach to distinguish between cause and effect for two variables. It is based on an independence assumption between input distribution and causal mechanism that can be phrased in terms of orthogonality in information space. We describe two intuitive reinterpretations of this approach that makes IGCI more accessible to a broader audience. Moreover, we show that the described independence is related to the hypothesis that unsupervised learning and semi-supervised learning only works for predicting the cause from the effect and not vice versa.

LGDec 19, 2013
Consistency of Causal Inference under the Additive Noise Model

Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing et al.

We analyze a family of methods for statistical causal inference from sample under the so-called Additive Noise Model. While most work on the subject has concentrated on establishing the soundness of the Additive Noise Model, the statistical consistency of the resulting inference methods has received little attention. We derive general conditions under which the given family of inference methods consistently infers the causal direction in a nonparametric setting.