CoditT5: Pretraining for Source Code and Natural Language Editing
This addresses the challenge of software editing for developers by providing a more effective model, though it is incremental as it builds on existing pretraining methods.
The authors tackled the problem of pretrained language models being poorly suited for software editing tasks by proposing a novel pretraining objective that explicitly models edits, resulting in CoditT5, which outperforms standard generation-based models and achieves state-of-the-art performance on tasks like comment updating, bug fixing, and automated code review.
Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.