LGMEMar 14, 2023

Testing Causality for High Dimensional Data

arXiv:2303.07774v1h-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of causal inference in high-dimensional data for scientific discovery, representing an incremental improvement with theoretical refinements.

The paper revisits the linear trace method for inferring causal direction between high-dimensional variables, strengthening existing results with improved tail analysis and extending to nonlinear trace functionals with sharper confidence bounds under certain assumptions, supported by experiments on synthetic datasets.

Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by providing an improved tail analysis in addition to extending the results to nonlinear trace functionals with sharper confidence bounds under certain distributional assumptions. We obtain our results by interpreting the trace estimator in the causal regime as a function over random orthogonal matrices, where the concentration of Lipschitz functions over such space could be applied. We additionally propose a novel ridge-regularized variant of the estimator in \cite{zscheischler2011testing}, and give provable bounds relating the ridge-estimated terms to their ground-truth counterparts. We support our theoretical results with encouraging experiments on synthetic datasets, more prominently, under high-dimension low sample size regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes