SENov 6, 2021

Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs

arXiv:2111.03922v165 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of automating bug fixes for software developers, but it is incremental as it applies an existing model to a new task.

The paper evaluated OpenAI's Codex for automatic program repair on the QuixBugs benchmark, finding it surprisingly effective and competitive with state-of-the-art techniques, with slightly better performance in Python than Java.

OpenAI's Codex, a GPT-3 like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, a task of central interest in the field of automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is slightly more successful at repairing Python than Java.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes