LGAICLSEJun 6, 2023

Large Language Models of Code Fail at Completing Code with Potential Bugs

Amazon
arXiv:2306.03438v253 citationsh-index: 99
AI Analysis

This addresses a critical issue for developers using AI-assisted coding tools, as it highlights a major limitation in real-world scenarios where code often has bugs, but the work is incremental as it builds on existing code completion research.

The paper tackles the problem of code completion by large language models when the input context contains potential bugs, finding that bug presence degrades performance significantly, e.g., causing a >50% drop in passing rates for CODEGEN-2B-MONO on synthetic datasets.

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CODEGEN-2B-MONO on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a significant gap in post-mitigation performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes