Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
This addresses the problem of evaluating reasoning in LLMs for researchers by providing a theory-grounded, contamination-resistant method, though it is incremental in applying existing cognitive theory to models.
The study investigated whether language models' reasoning errors follow human fallacy patterns, finding that as model capability increases, a larger share of incorrect answers match predicted fallacies, and reversing premise order reduces fallacy production.
We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open-source implementation, PyETR, we programmatically generate 383 formally specified reasoning problems and evaluate 38 models. For each response, we judge logical correctness and, when incorrect, whether it matches an ETR-predicted fallacy. Two results stand out: (i) as a capability proxy (Chatbot Arena Elo) increases, a larger share of a model's incorrect answers are ETR-predicted fallacies $(ρ=0.360, p=0.0265)$, while overall correctness on this dataset shows no correlation with capability; (ii) reversing premise order significantly reduces fallacy production for many models, mirroring human order effects. Methodologically, PyETR provides an open-source pipeline for unbounded, synthetic, contamination-resistant reasoning tests linked to a cognitive theory, enabling analyses that focus on error composition rather than error rate.