LG AI CLMay 23, 2025

Value-Guided Search for Efficient Chain-of-Thought Reasoning

Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun

arXiv:2505.17373v212 citationsh-index: 37Has Code

Originality Incremental advance

AI Analysis

This work addresses the computational inefficiency in large language model reasoning for AI researchers and practitioners, though it is incremental as it builds on existing process reward models.

The paper tackles the problem of inefficient test-time compute scaling in chain-of-thought reasoning by proposing a value-guided search method, which reduces inference FLOPs by 30% compared to majority voting while achieving better performance.

In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.

View on arXiv PDF Code

Similar