CLAILGApr 18, 2025

LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models

arXiv:2504.14089v21 citationsh-index: 3EMNLP
Originality Incremental advance
AI Analysis

This addresses the challenge of maintaining logical coherence and efficient premise search in complex reasoning tasks for AI systems, representing a strong incremental improvement over existing methods.

The paper tackles the problem of improving logical reasoning in large language models by proposing LogicTree, a framework that automates structured proof exploration with algorithm-guided search and caching mechanisms, achieving average accuracy gains of 23.6% over chain-of-thought and 12.5% over tree-of-thought on GPT-4o.

Large language models (LLMs) have achieved remarkable multi-step reasoning capabilities across various domains. However, LLMs still face distinct challenges in complex logical reasoning, as (1) proof-finding requires systematic exploration and the maintenance of logical coherence and (2) searching the right combination of premises at each reasoning step is inherently challenging in tasks with large premise space. To address this, we propose LogicTree, an inference-time modular framework employing algorithm-guided search to automate structured proof exploration and ensure logical coherence. Advancing beyond tree-of-thought (ToT), we incorporate caching mechanism into LogicTree to enable effective utilization of historical knowledge, preventing reasoning stagnation and minimizing redundancy. Furthermore, we address the combinatorial complexity of premise search by decomposing it into a linear process. The refined premise selection restricts subsequent inference to at most one derivation per step, enhancing reasoning granularity and enforcing strict step-by-step reasoning. Additionally, we introduce two LLM-free heuristics for premise prioritization, enabling strategic proof search. Experimental results on five datasets demonstrate that LogicTree optimally scales inference-time computation to achieve higher proof accuracy, surpassing chain-of-thought (CoT) and ToT with average gains of 23.6% and 12.5%, respectively, on GPT-4o. Moreover, within LogicTree, GPT-4o outperforms o3-mini by 7.6% on average.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes