CLAILOJan 30, 2024

Conditional and Modal Reasoning in Large Language Models

arXiv:2401.17169v431 citationsh-index: 15EMNLP
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing how closely LLMs' reasoning abilities match human logical capabilities, which is crucial for AI researchers and cognitive scientists, though it is incremental as it builds on existing evaluation frameworks.

The study evaluated 29 large language models on logical inferences involving conditionals and epistemic modals, finding that even the best models made basic errors and displayed inconsistencies, with zero-shot chain-of-thought prompting reducing mistakes but not eliminating gaps compared to human reasoning.

The reasoning abilities of large language models (LLMs) are the topic of a growing body of research in AI and cognitive science. In this paper, we probe the extent to which twenty-nine LLMs are able to distinguish logically correct inferences from logically fallacious ones. We focus on inference patterns involving conditionals (e.g., 'If Ann has a queen, then Bob has a jack') and epistemic modals (e.g., 'Ann might have an ace', 'Bob must have a king'). These inferences have been of special interest to logicians, philosophers, and linguists, since they play a central role in the fundamental human ability to reason about distal possibilities. Assessing LLMs on these inferences is thus highly relevant to the question of how much the reasoning abilities of LLMs match those of humans. All the LLMs we tested make some basic mistakes with conditionals or modals, though zero-shot chain-of-thought prompting helps them make fewer mistakes. Even the best performing LLMs make basic errors in modal reasoning, display logically inconsistent judgments across inference patterns involving epistemic modals and conditionals, and give answers about complex conditional inferences that do not match reported human judgments. These results highlight gaps in basic logical reasoning in today's LLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes