MLAIOct 5, 2025

Scalable Causal Discovery from Recursive Nonlinear Data via Truncated Basis Function Scores and Tests

arXiv:2510.04276v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses scalability issues in causal discovery for researchers and practitioners dealing with large, nonlinear datasets, representing an incremental improvement with hybrid methods.

The paper tackles the challenge of scalable causal discovery from nonlinear data by introducing two basis-expansion tools, BF-BIC and BF-LRT, which outperform existing methods in accuracy and runtime in simulations and a real-world wildfire risk application.

Learning graphical conditional independence structures from nonlinear, continuous or mixed data is a central challenge in machine learning and the sciences, and many existing methods struggle to scale to thousands of samples or hundreds of variables. We introduce two basis-expansion tools for scalable causal discovery. First, the Basis Function BIC (BF-BIC) score uses truncated additive expansions to approximate nonlinear dependencies. BF-BIC is theoretically consistent under additive models and extends to post-nonlinear (PNL) models via an invertible reparameterization. It remains robust under moderate interactions and supports mixed data through a degenerate-Gaussian embedding for discrete variables. In simulations with fully nonlinear neural causal models (NCMs), BF-BIC outperforms kernel- and constraint-based methods (e.g., KCI, RFCI) in both accuracy and runtime. Second, the Basis Function Likelihood Ratio Test (BF-LRT) provides an approximate conditional independence test that is substantially faster than kernel tests while retaining competitive accuracy. Extensive simulations and a real-data application to Canadian wildfire risk show that, when integrated into hybrid searches, BF-based methods enable interpretable and scalable causal discovery. Implementations are available in Python, R, and Java.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes