SEAILGMLMay 1, 2020

A Transformer-based Approach for Source Code Summarization

arXiv:2005.00653v11082 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of generating readable summaries for program functionality, which is incremental as it applies an existing Transformer model to a specific domain.

The paper tackled source code summarization by using a Transformer model with self-attention to capture long-range dependencies in code tokens, and it outperformed state-of-the-art techniques by a significant margin, with findings showing that relative encoding improves performance over absolute encoding.

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens' position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes