CLNov 7, 2024

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

arXiv:2411.05199v310 citationsh-index: 10Has CodeTrans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses the resource-intensive and over-generalizing nature of LLMs for code generation, making smaller open-source models more competitive with closed-source alternatives, though it appears incremental as it builds on existing fine-tuning and preference learning methods.

The paper tackles the problem of inefficient code generation by large language models by introducing CodeLutra, a framework that uses iterative preference-based refinement on both correct and incorrect code attempts to boost smaller models' performance. On a data science coding task, it improved Llama-3-8B's accuracy from 28.2% to 48.6% with only 500 samples, approaching GPT-4's level.

Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective alternative. However, standard supervised approaches rely only on correct examples, missing valuable insights from failures. We introduce CodeLutra, a framework that leverages both correct and incorrect code attempts. Instead of using only correct solutions, CodeLutra applies iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This approach narrows the performance gap with state-of-the-art larger models without requiring massive datasets or auxiliary models. For instance, on a challenging data science coding task, using only 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's level. By learning from both successes and mistakes, CodeLutra provides a scalable and efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes