LGMay 23, 2024
Hybrid Top-Down Global Causal Discovery with Local Search for Linear and Nonlinear Additive Noise ModelsSujai Hiremath, Jacqueline R. M. A. Maasch, Mengxiao Gao et al.
Learning the unique directed acyclic graph corresponding to an unknown causal model is a challenging task. Methods based on functional causal models can identify a unique graph, but either suffer from the curse of dimensionality or impose strong parametric assumptions. To address these challenges, we propose a novel hybrid approach for global causal discovery in observational data that leverages local causal substructures. We first present a topological sorting algorithm that leverages ancestral relationships in linear structural causal models to establish a compact top-down hierarchical ordering, encoding more causal information than linear orderings produced by existing methods. We demonstrate that this approach generalizes to nonlinear settings with arbitrary noise. We then introduce a nonparametric constraint-based algorithm that prunes spurious edges by searching for local conditioning sets, achieving greater accuracy than current methods. We provide theoretical guarantees for correctness and worst-case polynomial time complexities, with empirical validation on synthetic data.
LGOct 15, 2024
LoSAM: Local Search in Additive Noise Models with Mixed Mechanisms and General Noise for Global Causal DiscoverySujai Hiremath, Promit Ghosal, Kyra Gan
Inferring causal relationships from observational data is crucial when experiments are costly or infeasible. Additive noise models (ANMs) enable unique directed acyclic graph (DAG) identification, but existing sample-efficient ANM methods often rely on restrictive assumptions on the data generating process, limiting their applicability to real-world settings. We propose local search in additive noise models, LoSAM, a topological ordering method for learning a unique DAG in ANMs with mixed causal mechanisms and general noise distributions. We introduce new causal substructures and criteria for identifying roots and leaves, enabling efficient top-down learning. We prove asymptotic consistency and polynomial runtime, ensuring scalability and sample efficiency. We test LoSAM on synthetic and real-world data, demonstrating state-of-the-art performance across all mixed mechanism settings.
LGOct 16, 2025
From Guess2Graph: When and How Can Unreliable Experts Safely Boost Causal Discovery in Finite Samples?Sujai Hiremath, Dominik Janzing, Philipp Faller et al.
Causal discovery algorithms often perform poorly with limited samples. While integrating expert knowledge (including from LLMs) as constraints promises to improve performance, guarantees for existing methods require perfect predictions or uncertainty estimates, making them unreliable for practical use. We propose the Guess2Graph (G2G) framework, which uses expert guesses to guide the sequence of statistical tests rather than replacing them. This maintains statistical consistency while enabling performance improvements. We develop two instantiations of G2G: PC-Guess, which augments the PC algorithm, and gPC-Guess, a learning-augmented variant designed to better leverage high-quality expert input. Theoretically, both preserve correctness regardless of expert error, with gPC-Guess provably outperforming its non-augmented counterpart in finite samples when experts are "better than random." Empirically, both show monotonic improvement with expert accuracy, with gPC-Guess achieving significantly stronger gains.
LGJun 29, 2025
When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal DiscoveryDominik Meier, Sujai Hiremath, Promit Ghosal et al.
Distinguishing cause and effect from bivariate observational data is a foundational problem in many disciplines, but challenging without additional assumptions. Additive noise models (ANMs) are widely used to enable sample-efficient bivariate causal discovery. However, conventional ANM-based methods fail when unobserved mediators corrupt the causal relationship between variables. This paper makes three key contributions: first, we rigorously characterize why standard ANM approaches break down in the presence of unmeasured mediators. Second, we demonstrate that prior solutions for hidden mediation are brittle in finite sample settings, limiting their practical utility. To address these gaps, we propose Bivariate Denoising Diffusion (BiDD) for causal discovery, a method designed to handle latent noise introduced by unmeasured mediators. Unlike prior methods that infer directionality through mean squared error loss comparisons, our approach introduces a novel independence test statistic: during the noising and denoising processes for each variable, we condition on the other variable as input and evaluate the independence of the predicted noise relative to this input. We prove asymptotic consistency of BiDD under the ANM, and conjecture that it performs well under hidden mediation. Experiments on synthetic and real-world data demonstrate consistent performance, outperforming existing methods in mediator-corrupted settings while maintaining strong performance in mediator-free settings.