SECLNov 27, 2022

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

arXiv:2211.14875v3293 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the productivity of software developers by providing a more efficient debugging tool, though it is incremental as it builds on existing pretrained models and techniques.

The paper tackles the problem of automated software debugging by proposing a unified framework, CodeT5-DLR, based on CodeT5, which integrates bug detection, localization, and repair into a single model, achieving significant performance improvements over existing baselines on Java and Python datasets.

Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug localization and program repair (or bug fixing). However, these techniques often focus only on either one of them or approach them in a stage-wise manner, ignoring the mutual benefits between them. In this work, we propose a novel unified \emph{Detect-Localize-Repair} framework based on a pretrained programming language model CodeT5 to seamlessly address these tasks, named CodeT5-DLR. Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version. We evaluate it on each of these tasks and their combined setting on two newly collected line-level debugging datasets in Java and Python. Extensive results show that our model significantly outperforms existing baselines from both NLP and software engineering domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes