CLAILGSEDec 19, 2024

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

Peking U
arXiv:2412.15118v212 citationsh-index: 23Has CodeICML
Originality Highly original
AI Analysis

This addresses the challenge of generating correct and efficient code for complex programming tasks, offering an incremental improvement over existing supervision methods.

The paper tackled the problem of large language models struggling with complex code generation by introducing Outcome Refining Process Supervision, which unifies process and outcome supervision through executable verification, resulting in 26.9% higher correctness and 42.2% improved code efficiency in experiments.

Large Language Models excel at code generation yet struggle with complex programming tasks that demand sophisticated reasoning. To bridge this gap, traditional process supervision relies on learned reward models requiring costly training data and suffering from reward misalignment, while outcome supervision fails for complex tasks needing coordinated intermediate steps. We introduce Outcome Refining Process Supervision, which unifies process and outcome supervision by leveraging executable verification: a tree-structured search framework generates strategic alternatives, profiles execution metrics, and scores candidates via self-critique mechanisms that integrate runtime feedback with reasoning. Experiments across 5 models and 3 benchmarks show consistent gains, with 26.9% higher correctness and 42.2% improved code efficiency. The results demonstrate that ORPS enables LLMs to overcome local optima in code generation, suggesting a promising direction for combining verifiable outcomes with structured reasoning to tackle complex challenges. We open-source at: https://github.com/zhuohaoyu/ORPS

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes