Automatic Code Generation using Pre-Trained Language Models
This addresses code generation for developers, but it is incremental as it builds on existing language models.
The paper tackled the problem of applying pre-trained language models to code generation in Python, achieving a BLEU score of 0.22, which is a 46% improvement over a baseline.
Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly structured environment with strict syntax rules. Specifically, we propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46\% over a reasonable sequence-to-sequence baseline. All results and related code used for training and data processing are available on GitHub.