Causal Order Identification to Address Confounding: Binary Variables
This addresses a practical limitation in causal inference for binary data, offering a more robust method for researchers in fields like epidemiology or social sciences, though it is an incremental improvement on existing LiNGAM.
The paper tackles the problem of identifying causal order among binary variables when confounding is present, extending the LiNGAM framework by minimizing mutual information among noises and reducing computation to find globally optimal solutions. Experiments show significantly better performance, especially with confounding, achieving improved accuracy in causal order detection.
This paper considers an extension of the linear non-Gaussian acyclic model (LiNGAM) that determines the causal order among variables from a dataset when the variables are expressed by a set of linear equations, including noise. In particular, we assume that the variables are binary. The existing LiNGAM assumes that no confounding is present, which is restrictive in practice. Based on the concept of independent component analysis (ICA), this paper proposes an extended framework in which the mutual information among the noises is minimized. Another significant contribution is to reduce the realization of the shortest path problem. The distance between each pair of nodes expresses an associated mutual information value, and the path with the minimum sum (KL divergence) is sought. Although $p!$ mutual information values should be compared, this paper dramatically reduces the computation when no confounding is present. The proposed algorithm finds the globally optimal solution, while the existing locally greedily seek the order based on hypothesis testing. We use the best estimator in the sense of Bayes/MDL that correctly detects independence for mutual information estimation. Experiments using artificial and actual data show that the proposed version of LiNGAM achieves significantly better performance, particularly when confounding is present.