AISep 17, 2024

Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent

Fatemeh Haji, Mazal Bethany, Maryam Tabar, Jason Chiang, Anthony Rios, Peyman Najafirad

arXiv:2409.11527v217.214 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more trustworthy and systematic reasoning in LLMs for complex question-answering tasks, though it is incremental as it builds on existing multi-agent and ToT methods.

The paper tackles the problem of shallow reasoning path exploration in multi-agent LLMs by introducing a Thought Validator agent to scrutinize and discard flawed paths from Tree-of-Thought Reasoner agents, resulting in a 5.6% average improvement over standard ToT on the GSM8K dataset.

Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrently, Tree of Thoughts (ToT) methods have shown potential in improving reasoning for complex question-answering tasks by exploring diverse reasoning paths. A critical limitation in multi-agent reasoning is the 'Reasoner' agent's shallow exploration of reasoning paths. While ToT strategies could help mitigate this problem, they may generate flawed reasoning branches, which could harm the trustworthiness of the final answer. To leverage the strengths of both multi-agent reasoning and ToT strategies, we introduce a novel approach combining ToT-based Reasoner agents with a Thought Validator agent. Multiple Reasoner agents operate in parallel, employing ToT to explore diverse reasoning paths. The Thought Validator then scrutinizes these paths, considering a Reasoner's conclusion only if its reasoning is valid. This method enables a more robust voting strategy by discarding faulty reasoning paths, enhancing the system's ability to tackle tasks requiring systematic and trustworthy reasoning. Our method demonstrates superior performance compared to existing techniques when evaluated on the GSM8K dataset, outperforming the standard ToT strategy by an average 5.6% across four LLMs. The code and related content can be found in: https://github.com/SecureAIAutonomyLab/MA-ToT

View on arXiv PDF Code

Similar