SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition
This addresses the issue of unreliable reasoning in LLMs for users in AI and NLP applications, representing a novel method for a known bottleneck.
The paper tackles the problem of performance degradation in complex reasoning tasks for Large Language Models (LLMs) by introducing SELT, a framework using modified Monte Carlo Tree Search with task decomposition, achieving significant improvements in answer accuracy on benchmarks like MMLU and Seal-Tools.
While Large Language Models (LLMs) have achieved remarkable success in a wide range of applications, their performance often degrades in complex reasoning tasks. In this work, we introduce SELT (Self-Evaluation LLM Tree Search), a novel framework that leverages a modified Monte Carlo Tree Search (MCTS) to enhance LLM reasoning without relying on external reward models. By redefining the Upper Confidence Bound scoring to align with intrinsic self-evaluation capabilities of LLMs and decomposing the inference process into atomic subtasks augmented with semantic clustering at each node, SELT effectively balances exploration and exploitation, reduces redundant reasoning paths, and mitigates hallucination. We validate our approach on challenging benchmarks, including the knowledge-based MMLU and the Tool Learning dataset Seal-Tools, where SELT achieves significant improvements in answer accuracy and reasoning robustness compared to baseline methods. Notably, our framework operates without task-specific fine-tuning, demonstrating strong generalizability across diverse reasoning tasks. Relevant results and code are available at https://github.com/fairyshine/SELT .