AIIRFeb 20, 2025

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

arXiv:2502.14361v122 citationsh-index: 18Has CodeACL
Originality Incremental advance
AI Analysis

This addresses generalization challenges in mathematical reasoning evaluation for AI systems, though it is incremental as it builds on existing PRM methods.

The paper tackles the out-of-distribution generalization issues in Process Reward Models for mathematical reasoning by introducing RetrievalPRM, which uses a retrieval-enhanced mechanism to improve evaluation of reasoning steps, resulting in outperforming existing baselines across multiple real-world datasets.

While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies key OOD issues, including step OOD, caused by differences in reasoning patterns across model types and sizes, and question OOD, which arises from dataset shifts between training data and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps as a warmup, enhancing PRM's ability to evaluate target steps and improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetrievalPRM model, establishing a new standard for PRM performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes