AIDec 18, 2024

Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases

Federico Baldo, Simon Ferreira, Charles K. Assaad

arXiv:2412.14019v32.3h-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of extracting reliable causal knowledge from noisy text data for applications in fields like epidemiology and public health, representing an incremental improvement over existing methods.

The paper tackles the problem of unreliable causal discovery from text metadata by proposing a method to derive classes of causal orders that maximize consistency scores from Large Language Models, showing effectiveness in recovering correct causal orders on established benchmarks and real-world datasets.

Traditional causal discovery methods often rely on strong, untestable assumptions, which makes them unreliable in real applications. In this context, Large Language Models (LLMs) have emerged as a promising alternative for extracting causal knowledge from text-based metadata, which consolidates domain expertise. However, LLMs tend to be unreliable and prone to hallucinations, necessitating strategies that account for their limitations. One effective strategy is to use a consistency measure to assess reliability. Additionally, most text metadata does not clearly distinguish direct causal relationships from indirect ones, further complicating the discovery of a causal DAG. As a result, focusing on causal orders, rather than causal DAGs, emerges as a more practical and robust approach. We present a new method to derive a class of acyclic tournaments, which represent plausible causal orders, maximizing a consistency score derived from an LLM. Our approach starts by calculating pairwise consistency scores between variables, resulting in a semi-complete partially directed graph that consolidates these scores into an abstraction of the maximally consistent causal orders. Using this structure, we identify optimal acyclic tournaments, focusing on those that maximize consistency across all configurations. We subsequently show how both the abstraction and the class of causal orders can be used to estimate causal effects. We tested our method on both well-established benchmarks, as well as, real-world datasets from epidemiology and public health. Our results demonstrate the effectiveness of our approach in recovering the correct causal order.

View on arXiv PDF

Similar