LGNEPLMLNov 6, 2019

Data Generation for Neural Programming by Example

arXiv:1911.02624v13 citations
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in machine learning for program synthesis, offering an incremental improvement in data generation methods for researchers and practitioners in the field.

The paper tackles the challenge of generating meaningful synthetic training data for neural programming by example, which is crucial for model generalization, and introduces a novel SMT solver-based method that improves both the discriminatory power of example sets and model generalization to unfamiliar data.

Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes