AIJun 2, 2025

Improving LLM-Generated Code Quality with GRPO

arXiv:2506.02211v113.65 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses code quality issues for developers and users of LLM-generated code, but is incremental as it builds on existing reward-based training methods.

The paper tackles the problem that LLM-generated code often lacks maintainability, quality, and safety by developing a comprehensive library to quantify these aspects and using it as a reward in GRPO, resulting in improved code quality as confirmed by expert human annotators.

Large Language Models (LLMs) are gaining widespread use for code generation. Recent training procedures use execution feedback as a reward signal, typically focusing on the functional correctness of the code, using unit test pass rate as a reward signal. However, this reward signal fails to capture notions of maintainability, quality and safety of the code produced. We address this under-explored area and develop a comprehensive library to quantify various aspects of code quality, and use it as a reward in GRPO. We find GRPO increases code quality according to this measure, which is confirmed by expert, blinded human annotators.

View on arXiv PDF

Similar