AI LGSep 1, 2023

No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function

arXiv:2309.03224v315.216 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of improving mathematical reasoning in fine-tuned LLMs for NLP and AI applications, but it is incremental as it builds on existing fine-tuned models with a novel search-based enhancement.

The paper tackles the problem of large language models (LLMs) struggling with correct mathematical reasoning steps despite high solution probabilities, by proposing a method using Monte Carlo Tree Search (MCTS) guided by an energy function to enhance reasoning without additional fine-tuning, resulting in significant improvements in pass@1 metrics on benchmarks like GSM8k and AQUA-RAT.

Large language models (LLMs) demonstrate impressive language understanding and contextual learning abilities, making them suitable for natural language processing (NLP) tasks and complex mathematical reasoning. However, when applied to mathematical reasoning tasks, LLMs often struggle to generate correct reasoning steps and answers despite having high probabilities for the solutions. To overcome this limitation and enhance the mathematical reasoning capabilities of fine-tuned LLMs without additional fine-tuning steps, we propose a method that incorporates Monte Carlo Tree Search (MCTS) and a lightweight energy function to rank decision steps and enable immediate reaction and precise reasoning. Specifically, we re-formulate the fine-tuned LLMs into a Residual-based Energy Model (Residual-EBM) and employ noise contrastive estimation to estimate the energy function's parameters. We then utilize MCTS with the energy function as a path verifier to search the output space and evaluate the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, GSM8k and AQUA-RAT, we demonstrate the exceptional capabilities of our method, which significantly improves the pass@1 metric of the fine-tuned model without requiring additional fine-tuning or reinforcement learning with human feedback alignment.

View on arXiv PDF

Similar