CL LGFeb 21, 2021

Automatic Code Generation using Pre-Trained Language Models

Luis Perez, Lizi Ottens, Sudharshan Viswanathan

arXiv:2102.10535v127 citations

Originality Incremental advance

AI Analysis

This addresses code generation for developers, but it is incremental as it builds on existing language models.

The paper tackled the problem of applying pre-trained language models to code generation in Python, achieving a BLEU score of 0.22, which is a 46% improvement over a baseline.

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly structured environment with strict syntax rules. Specifically, we propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46\% over a reasonable sequence-to-sequence baseline. All results and related code used for training and data processing are available on GitHub.

View on arXiv PDF

Similar