SEAIMay 29, 2025

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

arXiv:2505.23387v311 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck for real-world deployment of LLMs in code generation by enabling self-improvement in efficiency, though it is incremental as it builds on existing RL methods.

The paper tackles the problem of LLMs generating inefficient code by introducing a test-time iterative optimization framework that uses reinforcement learning with execution feedback, resulting in significant improvements in pass@1 from 47% to 62% and efficiency outperforming human submissions from 31% to 45%.

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore three training strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). Experiments on our Venus dataset and the APPS benchmark show that SFT and DPO rapidly saturate in efficiency gains. In contrast, GRPO, using reinforcement learning (RL) with execution feedback, continuously optimizes code performance, significantly boosting both pass@1 (from 47% to 62%) and the likelihood of outperforming human submissions in efficiency (from 31% to 45%). Our work demonstrates effective test-time code efficiency improvement and critically reveals the power of RL in teaching LLMs to truly self-improve code efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes