LG CLOct 31, 2024

Failure Modes of LLMs for Causal Reasoning on Narratives

Khurram Yamin, Shantanu Gupta, Gaurav R. Ghosal, Zachary C. Lipton, Bryan Wilder

arXiv:2410.23884v514.211 citationsh-index: 58Has Code

Originality Incremental advance

AI Analysis

This work identifies systematic failure modes in LLMs for causal reasoning, which is incremental but important for improving autonomous decision-making in AI systems.

The study investigated how large language models (LLMs) perform causal reasoning on narratives, finding they often rely on superficial heuristics like event order or memorized knowledge without context, but simple task reformulations can improve robustness.

The ability to robustly identify causal relationships is essential for autonomous decision-making and adaptation to novel scenarios. However, accurately inferring causal structure requires integrating both world knowledge and abstract logical reasoning. In this work, we investigate the interaction between these two capabilities through the representative task of causal reasoning over narratives. Through controlled synthetic, semi-synthetic, and real-world experiments, we find that state-of-the-art large language models (LLMs) often rely on superficial heuristics -- for example, inferring causality from event order or recalling memorized world knowledge without attending to context. Furthermore, we show that simple reformulations of the task can elicit more robust reasoning behavior. Our evaluation spans a range of causal structures, from linear chains to complex graphs involving colliders and forks. These findings uncover systematic patterns in how LLMs perform causal reasoning and lay the groundwork for developing methods that better align LLM behavior with principled causal inference.

View on arXiv PDF Code

Similar