Expert-Aided Causal Discovery of Ancestral Graphs
This addresses the problem of unreliable causal inference with latent confounders for researchers and practitioners in fields like healthcare or social sciences, representing an incremental improvement by integrating expert knowledge into existing methods.
The paper tackles the brittleness of causal discovery algorithms under data scarcity and lack of uncertainty quantification by introducing Ancestral GFlowNets (AGFNs), which sample ancestral graphs proportionally to a score-based belief distribution, and shows that AGFN is competitive on synthetic and real-world datasets and that incorporating expert or LLM feedback improves inference quality.
Causal discovery (CD) algorithms are notably brittle when data is scarce, inferring unreliable causal relations that may contradict expert knowledge, especially when considering latent confounders. Furthermore, the lack of uncertainty quantification in most CD methods hinders users from diagnosing and refining results. To address these issues, we introduce Ancestral GFlowNets (AGFNs). AGFN samples ancestral graphs (AGs) proportionally to a score-based belief distribution representing our epistemic uncertainty over the causal relationships. Building upon this distribution, we propose an elicitation framework for expert-driven assessment. This framework comprises an optimal experimental design to probe the expert and a scheme to incorporate the obtained feedback into AGFN. Our experiments show that: i) AGFN is competitive against other methods that address latent confounding on both synthetic and real-world datasets; and ii) our design for incorporating feedback from a (simulated) human expert or a Large Language Model (LLM) improves inference quality.