CLLGNov 25, 2022

CodeExp: Explanatory Code Document Generation

arXiv:2211.15395v1292 citationsh-index: 66Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for implementation-level code explanations to aid software maintenance and programming education, though it is incremental as it builds on existing code-to-text generation models.

The authors tackled the problem of generating detailed code explanations by proposing a new task, collecting a refined dataset, and developing evaluation metrics and a fine-tuning strategy, resulting in models that outperform those trained on 15x larger unrefined data and produce docstrings comparable to human-written ones.

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes