CLLGMay 28, 2025

If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals?

arXiv:2505.22318v1h-index: 46
Originality Incremental advance
AI Analysis

This addresses a key limitation in LLMs for real-world applications where reasoning must be independent of factual knowledge, though it is an incremental improvement through a novel prompting technique.

The paper tackles the problem of LLMs struggling with logical reasoning in counterfactual scenarios that conflict with their parametric knowledge, finding that 11 LLMs show an average 27% accuracy drop on a new dataset called CounterLogic. The authors propose a prompting method called Self-Segregate that reduces this gap to 11% and boosts overall accuracy by 7.5%.

Large Language Models (LLMs) demonstrate impressive reasoning capabilities in familiar contexts, but struggle when the context conflicts with their parametric knowledge. To investigate this phenomenon, we introduce CounterLogic, a dataset containing 1,800 examples across 9 logical schemas, explicitly designed to evaluate logical reasoning through counterfactual (hypothetical knowledge-conflicting) scenarios. Our systematic evaluation of 11 LLMs across 6 different datasets reveals a consistent performance degradation, with accuracies dropping by 27% on average when reasoning through counterfactual information. We propose Self-Segregate, a prompting method enabling metacognitive awareness (explicitly identifying knowledge conflicts) before reasoning. Our method dramatically narrows the average performance gaps from 27% to just 11%, while significantly increasing the overall accuracy (+7.5%). We discuss the implications of these findings and draw parallels to human cognitive processes, particularly on how humans disambiguate conflicting information during reasoning tasks. Our findings offer practical insights for understanding and enhancing LLMs reasoning capabilities in real-world applications, especially where models must logically reason independently of their factual knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes