CLLGFeb 21, 2021

Automatic Code Generation using Pre-Trained Language Models

arXiv:2102.10535v127 citations
Originality Incremental advance
AI Analysis

This addresses code generation for developers, but it is incremental as it builds on existing language models.

The paper tackled the problem of applying pre-trained language models to code generation in Python, achieving a BLEU score of 0.22, which is a 46% improvement over a baseline.

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly structured environment with strict syntax rules. Specifically, we propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46\% over a reasonable sequence-to-sequence baseline. All results and related code used for training and data processing are available on GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes