SEJun 18, 2020

Prioritizing documentation effort: Can we do better?

arXiv:2006.10892v1Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge for software developers who lack time to document all code, offering an incremental improvement over existing methods.

The paper tackles the problem of prioritizing which code modules to document by proposing an unsupervised PageRank-based method that uses module dependencies, eliminating the need for labeled training data. It shows this approach outperforms a prior supervised neural network method on six large datasets.

Code documentations are essential for software quality assurance, but due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize important modules for documentation effort. However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not be easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets. In this paper, we propose an unsupervised approach based on PageRank to prioritize documentation effort. This approach identifies "important" modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build the prediction model. In order to evaluate the effectiveness of the PageRank approach, we use six additional large data sets to conduct the experiments in addition to the same data sets collected from open-source projects as used in prior studies. The experimental results show that the PageRank approach is superior to the state-of-the-art ANN approach in prioritizing important modules for documentation effort. In particular, due to the simplicity and effectiveness, we advocate that the PageRank approach should be used as an easy-to-implement baseline in future research on documentation effort prioritization, and any new approach should be compared with it to demonstrate its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes