SEAIAug 9, 2024

COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

arXiv:2408.05006v319 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of evaluating and improving code debugging in LLMs for software development, though it is incremental as it builds on existing SFT methods with a novel data synthesis approach.

The paper tackles the limited evaluation of LLMs' code debugging abilities by introducing DEBUGEVAL, a benchmark that emulates the multi-stage human debugging process, and proposes COAST, a framework using a multi-agent system to generate training data, which enables 7B-scale LLMs to achieve debugging performance comparable to GPT-3.5.

Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in the code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process. Through evaluating on DEBUGEVAL, we observe that 7B-scale models consistently underperform compared to their larger counterparts, highlighting their limitations in comprehending code semantics. In this case, we propose the COmmunicative Agent-based data SynThesis (COAST) framework, which employs a multi-agent system to generate high-quality training data for supervised fine-tuning (SFT). Experimental results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data, enabling 7B-scale LLMs to achieve debugging performance comparable to GPT-3.5. All data and codes are available at https://github.com/NEUIR/COAST.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes