AIDec 2, 2025

When Do Symbolic Solvers Enhance Reasoning in Large Language Models?

arXiv:2512.03272v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of token overhead and incorrect answers in LLMs for researchers and practitioners by identifying specific conditions where symbolic solvers are beneficial, though it is incremental in refining existing hybrid approaches.

The paper investigates when symbolic solvers enhance reasoning in large language models (LLMs), finding that they help only when problems require limited implicit reasoning but involve ample search space, such as significantly improving performance in constraint satisfaction problems requiring repeated backtracks, with CodeLlama-13B outperforming GPT-4o in difficult Zebra puzzles when provided a declarative exemplar.

Large Reasoning Models (LRMs) achieve strong performance on complex reasoning tasks by generating long Chains of Thought (CoTs). However, this paradigm might incur substantial token overhead, especially when models "overthink" by producing lengthy reasoning chains, which can even lead to incorrect answers. A promising direction is the symbolic-solver-integrated approach, which leverages the code generation capabilities of LLMs to translate reasoning tasks into executable code and then solve them with a symbolic solver. In this paper, we explore an open question of when the conventional long-CoT can be enhanced by symbolic solvers. Our experimental results show that the symbolic-solver-integrated method only helps when the problem requires limited implicit reasoning but involves an ample search space. The latest LLMs, like GPT-4o, show better performance on deductive problems with shallow reasoning depth, while the symbolic-solver-integrated method significantly improves the LLMs' performance in constraint satisfaction problems that require repeated backtracks. When a declarative exemplar is provided, even CodeLlama-13B can outperform GPT-4o in difficult Zebra puzzles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes