AICLGTLOAug 28, 2024

Towards Logically Sound Natural Language Reasoning with Logic-Enhanced Language Model Agents

arXiv:2408.16081v24 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable reasoning in LLMs for users in AI and agentic applications, but it is incremental as it builds on existing logic integration methods.

The paper tackles the problem of logical errors in large language models (LLMs) used for reasoning by proposing LELMA, a framework that integrates LLMs with formal logic for validation and refinement, achieving high accuracy in error detection and improving reasoning correctness, especially in GPT-4o.

Large language models (LLMs) are increasingly explored as general-purpose reasoners, particularly in agentic contexts. However, their outputs remain prone to mathematical and logical errors. This is especially challenging in open-ended tasks, where unstructured outputs lack explicit ground truth and may contain subtle inconsistencies. To address this issue, we propose Logic-Enhanced Language Model Agents (LELMA), a framework that integrates LLMs with formal logic to enable validation and refinement of natural language reasoning. LELMA comprises three components: an LLM-Reasoner, an LLM-Translator, and a Solver, and employs autoformalization to translate reasoning into logic representations, which are then used to assess logical validity. Using game-theoretic scenarios such as the Prisoner's Dilemma as testbeds, we highlight the limitations of both less capable (Gemini 1.0 Pro) and advanced (GPT-4o) models in generating logically sound reasoning. LELMA achieves high accuracy in error detection and improves reasoning correctness via self-refinement, particularly in GPT-4o. The study also highlights challenges in autoformalization accuracy and in evaluation of inherently ambiguous open-ended reasoning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes