SECLMar 27, 2024

CYCLE: Learning to Self-Refine the Code Generation

arXiv:2403.18746v175 citationsh-index: 15Proc. ACM Program. Lang.
Originality Incremental advance
AI Analysis

This addresses a critical issue for developers who struggle to debug AI-generated code, offering a novel method to enhance code generation reliability, though it is incremental in improving existing model capabilities.

The paper tackles the problem of code language models lacking efficient self-refinement capabilities for faulty code generation, proposing the CYCLE framework that learns to self-refine based on feedback like test suite results, achieving performance boosts of up to 63.5% across benchmarks and outperforming models with three times more parameters.

Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well. In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5%, across benchmarks and varied model sizes. We also notice that CYCLE outperforms code LMs that have 3$\times$ more parameters in self-refinement.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes