CLAILGJun 4, 2024

Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

arXiv:2406.02787v124 citationsHas Code
Originality Synthesis-oriented
AI Analysis

It addresses the problem of evaluating LLM reasoning capabilities for researchers and practitioners, but it is incremental as it builds on existing work in logic and context analysis.

This study investigates whether large language models (LLMs) can perform genuine logical reasoning by comparing their performance on abstract versus contextualized logical problems across domains, finding that abstract problems alone may not accurately benchmark real-world reasoning due to contextual influences.

This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abstract logical problems alone accurately benchmark an LLM's reasoning ability in real-world scenarios, disentangled from contextual support in practical settings? (2) Does fine-tuning LLMs on abstract logic problem generalize to contextualized logic problems and vice versa? To investigate these questions, we focus on standard propositional logic, specifically propositional deductive and abductive logic reasoning. In particular, we construct instantiated datasets for deductive and abductive reasoning with 4 levels of difficulty, encompassing 12 distinct categories or domains based on the categorization of Wikipedia. Our experiments aim to provide insights into disentangling context in logical reasoning and the true reasoning capabilities of LLMs and their generalization potential. The code and dataset are available at: https://github.com/agiresearch/ContextHub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes