AI CL LGFeb 19, 2025

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu, Junjie Lu, Zhaoling Chen, Chaofeng Qu, Jason Klein Liu, Chonghan Liu, Zefan Cai, Yunhui Xia, Li Zhao, Jiang Bian, Chuheng Zhang, Wei Shen

Tsinghua

arXiv:2502.13943v223.820 citationsh-index: 13Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently training PRMs for AI reasoning tasks, offering a cost-effective solution with improved performance, though it appears incremental as it builds on existing PRM frameworks.

The authors tackled the problem of automatically dividing reasoning steps for Process Reward Models (PRMs) by proposing AdaptiveStep, which uses model confidence in predicting the next word instead of rule-based methods. This approach achieved state-of-the-art Best-of-N performance in mathematical reasoning and code generation tasks while reducing construction costs by over 30% compared to existing PRMs.

Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. In addition, we provide a thorough analysis and case study on the PRM's performance, transferability, and generalization capabilities.

View on arXiv PDF Code

Similar