SEJul 22, 2021

An Empirical Study on Code Comment Completion

Antonio Mastropaolo, Emad Aghajani, Luca Pascarella, Gabriele Bavota

arXiv:2107.10544v110.428 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the incremental improvement of developer productivity in software engineering by simplifying comment generation.

The paper tackles the problem of code comment completion by comparing an n-gram model and T5 architecture, finding T5 superior but n-gram competitive, with results showing T5 achieving a BLEU score of 0.45 and n-gram 0.38 on a dataset of 100,000 comments.

Code comments play a prominent role in program comprehension activities. However, source code is not always documented and code and comments not always co-evolve. To deal with these issues, researchers have proposed techniques to automatically generate comments documenting a given code at hand. The most recent works in the area applied deep learning (DL) techniques to support such a task. Despite the achieved advances, the empirical evaluations of these approaches show that they are still far from a performance level that would make them valuable for developers. We tackle a simpler and related problem: Code comment completion. Instead of generating a comment for a given code from scratch, we investigate the extent to which state-of-the-art techniques can help developers in writing comments faster. We present a large-scale study in which we empirically assess how a simple n-gram model and the recently proposed Text-To-Text Transfer Transformer (T5) architecture can perform in autocompleting a code comment the developer is typing. The achieved results show the superiority of the T5 model, despite the n-gram model being a competitive solution.

View on arXiv PDF Code

Similar