LG AI CLDec 17, 2025

DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding

Ruiyi Zhang, Peijia Qin, Qi Cao, Pengtao Xie

arXiv:2512.15000v12 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses coding efficiency for developers by enhancing LLM-based code generation, though it is incremental as it builds on existing PRM methods with domain-specific adaptations.

The paper tackled the problem of improving Large Language Models (LLMs) for coding by addressing limitations in Process Reward Models (PRMs), such as poor step decompositions and noisy labels, and achieved a state-of-the-art pass@1 rate of 80.9% on LiveCodeBench, outperforming OpenAI o4-mini.

Process Reward Models (PRMs) have become essential for improving Large Language Models (LLMs) via test-time scaling, yet their effectiveness in coding remains limited due to the lack of meaningful step decompositions in code and the noise of Monte-Carlo-generated partial labels. We propose DreamPRM-Code, a coding-focused PRM that treats functions as reasoning steps using a Chain-of-Function prompting strategy to induce modular code generation, enabling PRM training and application analogous to mathematical reasoning tasks. To address label noise, DreamPRM-Code introduces a meta-learning-based correction mechanism that leverages clean final-solution unit-test labels and performs bi-level optimization to refine intermediate labels. Applying on test-time scaling, DreamPRM-Code achieved state-of-the-art performance on LiveCodeBench with 80.9 pass@1 rate, surpassing OpenAI o4-mini.

View on arXiv PDF

Similar