LGSep 28, 2025

Improving constraint-based discovery with robust propagation and reliable LLM priors

Ruiqi Lyu, Alistair Turcan, Martin Jinye Zhang, Bryan Wilder

arXiv:2509.23570v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of unreliable edge orientation in causal discovery for scientific modeling and decision-making, representing an incremental improvement over existing methods.

The paper tackled the problem of cascading errors in constraint-based causal discovery by introducing MosaCD, which combines high-confidence seeds from CI tests and LLM annotations with a novel propagation strategy, achieving higher accuracy in final graph construction across multiple real-world graphs.

Learning causal structure from observational data is central to scientific modeling and decision-making. Constraint-based methods aim to recover conditional independence (CI) relations in a causal directed acyclic graph (DAG). Classical approaches such as PC and subsequent methods orient v-structures first and then propagate edge directions from these seeds, assuming perfect CI tests and exhaustive search of separating subsets -- assumptions often violated in practice, leading to cascading errors in the final graph. Recent work has explored using large language models (LLMs) as experts, prompting sets of nodes for edge directions, and could augment edge orientation when assumptions are not met. However, such methods implicitly assume perfect experts, which is unrealistic for hallucination-prone LLMs. We propose MosaCD, a causal discovery method that propagates edges from a high-confidence set of seeds derived from both CI tests and LLM annotations. To filter hallucinations, we introduce shuffled queries that exploit LLMs' positional bias, retaining only high-confidence seeds. We then apply a novel confidence-down propagation strategy that orients the most reliable edges first, and can be integrated with any skeleton-based discovery method. Across multiple real-world graphs, MosaCD achieves higher accuracy in final graph construction than existing constraint-based methods, largely due to the improved reliability of initial seeds and robust propagation strategies.

View on arXiv PDF

Similar