LGAICLMay 23, 2025

Value-Guided Search for Efficient Chain-of-Thought Reasoning

arXiv:2505.17373v212 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

This work addresses the computational inefficiency in large language model reasoning for AI researchers and practitioners, though it is incremental as it builds on existing process reward models.

The paper tackles the problem of inefficient test-time compute scaling in chain-of-thought reasoning by proposing a value-guided search method, which reduces inference FLOPs by 30% compared to majority voting while achieving better performance.

In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes