AIJan 23, 2025

Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning

Yulan Hu, Sheng Ouyang, Jinman Zhao, Yong Liu

arXiv:2501.13622v415.67 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses redundancy in mathematical reasoning for LLMs, though it appears incremental as it builds on existing process reward modeling.

The paper tackles the problem of redundant reasoning steps in mathematical reasoning by proposing CFPRM, a coarse-to-fine strategy that merges and refines steps at multiple granularities, achieving validated effectiveness across two datasets and three loss criteria.

The Process Reward Model (PRM) plays a crucial role in mathematical reasoning tasks, requiring high-quality supervised process data. However, we observe that reasoning steps generated by Large Language Models (LLMs) often fail to exhibit strictly incremental information, leading to redundancy that can hinder effective reasoning. To address this issue, we propose CFPRM, a simple yet effective coarse-to-fine strategy. Instead of focusing on the detection of redundant steps, our approach first establishes a coarse-grained window to merge adjacent reasoning steps into unified, holistic steps. The window size is then progressively reduced to extract fine-grained reasoning steps, enabling data collection at multiple granularities for training. By leveraging this hierarchical refinement process, CFPRM mitigates redundancy while preserving essential fine-grained knowledge. Extensive experiments on two reasoning datasets across three loss criteria validate the CFPRM's effectiveness and versatility.

View on arXiv PDF

Similar