SEAug 6, 2017

CodeSum: Translate Program Language to Natural Language

arXiv:1708.01837v214 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient code understanding among programmers, though it is incremental as it builds on existing Seq2Seq and attention mechanisms.

The paper tackles the problem of code comprehension in software maintenance by proposing CodeSum, a model that generates natural language descriptions for source code using an attention-based Seq2Seq neural network with Structure-based Traversal of ASTs, and it significantly outperforms state-of-the-art methods on large-scale corpora in Java, C#, and SQL.

During software maintenance, programmers spend a lot of time on code comprehension. Reading comments is an effective way for programmers to reduce the reading and navigating time when comprehending source code. Therefore, as a critical task in software engineering, code summarization aims to generate brief natural language descriptions for source code. In this paper, we propose a new code summarization model named CodeSum. CodeSum exploits the attention-based sequence-to-sequence (Seq2Seq) neural network with Structure-based Traversal (SBT) of Abstract Syntax Trees (AST). The AST sequences generated by SBT can better present the structure of ASTs and keep unambiguous. We conduct experiments on three large-scale corpora in different program languages, i.e., Java, C#, and SQL, in which Java corpus is our new proposed industry code extracted from Github. Experimental results show that our method CodeSum outperforms the state-of-the-art significantly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes