CLAug 29, 2018

Retrieval-Based Neural Code Generation

arXiv:1808.10025v11122 citations
Originality Incremental advance
AI Analysis

This work addresses code generation for developers by providing an incremental improvement through subtree retrieval.

The paper tackles the problem of generating complex program source code from natural language by addressing the inability of existing methods to memorize large structures, introducing ReCode, a retrieval-based method that improves performance by up to +2.6 BLEU on two tasks.

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes