LGCLNESEJun 9, 2021

Energy-Based Models for Code Generation under Compilability Constraints

arXiv:2106.04985v114 citations
Originality Incremental advance
AI Analysis

This addresses the issue of generating syntactically correct and compilable code for developers and AI-assisted coding tools, representing an incremental improvement over existing methods.

The paper tackled the problem of generating compilable code by framing it as constraint satisfaction using an Energy-Based Model (EBM) with a pre-trained generative model, and improved compilability rates without sacrificing diversity and complexity in generated samples.

Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm (Khalifa et al., 2021) to train a generative model approximating the EBM. We conduct experiments showing that our proposed approach is able to improve compilability rates without sacrificing diversity and complexity of the generated samples.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes