CLAIJun 5, 2024

Are LLMs classical or nonmonotonic reasoners? Lessons from generics

arXiv:2406.06590v228 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately assessing reasoning capabilities in LLMs for researchers and developers, revealing pitfalls in attributing human-like reasoning to these models, though it is incremental in nature.

The study investigated whether large language models (LLMs) can perform nonmonotonic reasoning, a key aspect of human cognition, by testing seven state-of-the-art models on tasks involving generics like 'Birds fly' and exceptions like 'Penguins don't fly'. The results showed that while LLMs exhibited some human-like reasoning patterns, they failed to maintain stable beliefs when presented with additional supporting examples or unrelated information.

Recent scholarship on reasoning in LLMs has supplied evidence of impressive performance and flexible adaptation to machine generated or human feedback. Nonmonotonic reasoning, crucial to human cognition for navigating the real world, remains a challenging, yet understudied task. In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as 'Birds fly', and exceptions, 'Penguins don't fly' (see Fig. 1). While LLMs exhibit reasoning patterns in accordance with human nonmonotonic reasoning abilities, they fail to maintain stable beliefs on truth conditions of generics at the addition of supporting examples ('Owls fly') or unrelated information ('Lions have manes'). Our findings highlight pitfalls in attributing human reasoning behaviours to LLMs, as well as assessing general capabilities, while consistent reasoning remains elusive.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes