PLLGMay 18, 2022

Transformer-based Program Synthesis for Low-Data Environments

arXiv:2205.09246v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of generating accurate programs with limited data for AI and software development applications, representing an incremental improvement over existing methods.

The paper tackles the problem of transformer models performing poorly in low-data and long-horizon program synthesis by using attributed context-free grammars to generate and annotate programs with compile and runtime attributes, finding that this approach improves quality and reduces errors, especially in low-data environments.

Recent advancements in large pre-trained transformer models (GPT2/3, T5) have found use in program synthesis to generate programs that satisfy a set of input/output examples. However, these models perform poorly on long-horizon and low-data tasks, and often don't seem to understand the semantics of the languages they generate. We investigate an approach that tackles both of these issues, by using attributed context-free-grammars of programming languages to generate programs, and then analyzing generated programs so that they can be annotated with compile and runtime attributes, such as types, so that information about the program can be remembered during long-horizon generation. We firstly find that synthesized datasets can be made efficiently and can provide transformer models with enough data in order to perform well on some synthesis tasks. We also find that giving models access to program attributes is especially effective in low-data environments, and tends improve the quality and reduce errors of transformer-generated programs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes