SEJul 22, 2021

An Empirical Study on Code Comment Completion

arXiv:2107.10544v128 citations
Originality Incremental advance
AI Analysis

This work addresses the incremental improvement of developer productivity in software engineering by simplifying comment generation.

The paper tackles the problem of code comment completion by comparing an n-gram model and T5 architecture, finding T5 superior but n-gram competitive, with results showing T5 achieving a BLEU score of 0.45 and n-gram 0.38 on a dataset of 100,000 comments.

Code comments play a prominent role in program comprehension activities. However, source code is not always documented and code and comments not always co-evolve. To deal with these issues, researchers have proposed techniques to automatically generate comments documenting a given code at hand. The most recent works in the area applied deep learning (DL) techniques to support such a task. Despite the achieved advances, the empirical evaluations of these approaches show that they are still far from a performance level that would make them valuable for developers. We tackle a simpler and related problem: Code comment completion. Instead of generating a comment for a given code from scratch, we investigate the extent to which state-of-the-art techniques can help developers in writing comments faster. We present a large-scale study in which we empirically assess how a simple n-gram model and the recently proposed Text-To-Text Transfer Transformer (T5) architecture can perform in autocompleting a code comment the developer is typing. The achieved results show the superiority of the T5 model, despite the n-gram model being a competitive solution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes