AIOct 15, 2025

Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment

arXiv:2510.13985v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This research addresses a problem for AI safety and reliability, as it highlights biases in LLMs that could impact decision-making in domains requiring accurate causal reasoning, though it is incremental in applying a known cognitive paradigm to LLMs.

The study investigated whether large language models (LLMs) develop causal illusions, specifically the illusion of causality, when evaluating null contingency scenarios in medical contexts, finding that all models systematically inferred unwarranted causal relationships, indicating susceptibility to this bias.

Causal learning is the cognitive process of developing the capability of making causal inferences based on available information, often guided by normative principles. This process is prone to errors and biases, such as the illusion of causality, in which people perceive a causal relationship between two variables despite lacking supporting evidence. This cognitive bias has been proposed to underlie many societal problems, including social prejudice, stereotype formation, misinformation, and superstitious thinking. In this work, we examine whether large language models are prone to developing causal illusions when faced with a classic cognitive science paradigm: the contingency judgment task. To investigate this, we constructed a dataset of 1,000 null contingency scenarios (in which the available information is not sufficient to establish a causal relationship between variables) within medical contexts and prompted LLMs to evaluate the effectiveness of potential causes. Our findings show that all evaluated models systematically inferred unwarranted causal relationships, revealing a strong susceptibility to the illusion of causality. While there is ongoing debate about whether LLMs genuinely understand causality or merely reproduce causal language without true comprehension, our findings support the latter hypothesis and raise concerns about the use of language models in domains where accurate causal reasoning is essential for informed decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes