SEApr 10, 2020

Improved Automatic Summarization of Subroutines via Attention to File Context

Sakib Haque, Alexander LeClair, Lingfei Wu, Collin McMillan

arXiv:2004.04881v129.1118 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating better software documentation for programmers, but it appears incremental as it builds on existing AI-based methods by incorporating file context.

The paper tackles the problem of source code summarization by addressing the limitation of existing AI-based approaches that ignore file context, presenting a method that models file context with attention to improve summary generation. The result is an extension and improvement over recent baselines, though no concrete numbers are provided.

Software documentation largely consists of short, natural language summaries of the subroutines in the software. These summaries help programmers quickly understand what a subroutine does without having to read the source code him or herself. The task of writing these descriptions is called "source code summarization" and has been a target of research for several years. Recently, AI-based approaches have superseded older, heuristic-based approaches. Yet, to date these AI-based approaches assume that all the content needed to predict summaries is inside subroutine itself. This assumption limits performance because many subroutines cannot be understood without surrounding context. In this paper, we present an approach that models the file context of subroutines (i.e. other subroutines in the same file) and uses an attention mechanism to find words and concepts to use in summaries. We show in an experiment that our approach extends and improves several recent baselines.

View on arXiv PDF Code

Similar