Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components
This addresses the issue of incorrect causal conclusions in time series analysis for researchers and practitioners when hidden confounders exist, offering a novel methodological approach.
The paper tackles the problem of causal inference from non-experimental time series when hidden variables are present, by proposing to interpret the transition matrix of a vector autoregressive process with hidden components causally, and shows that key parts of this matrix are identifiable under conditions like non-Gaussian noise or no feedback from observed to hidden variables, with algorithms evaluated on synthetic and real-world data.
A widely applied approach to causal inference from a non-experimental time series $X$, often referred to as "(linear) Granger causal analysis", is to regress present on past and interpret the regression matrix $\hat{B}$ causally. However, if there is an unmeasured time series $Z$ that influences $X$, then this approach can lead to wrong causal conclusions, i.e., distinct from those one would draw if one had additional information such as $Z$. In this paper we take a different approach: We assume that $X$ together with some hidden $Z$ forms a first order vector autoregressive (VAR) process with transition matrix $A$, and argue why it is more valid to interpret $A$ causally instead of $\hat{B}$. Then we examine under which conditions the most important parts of $A$ are identifiable or almost identifiable from only $X$. Essentially, sufficient conditions are (1) non-Gaussian, independent noise or (2) no influence from $X$ to $Z$. We present two estimation algorithms that are tailored towards conditions (1) and (2), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using $X$.