Creating Synthetic Datasets via Evolution for Neural Program Synthesis
This addresses a generalization problem in program synthesis for AI systems that rely on example-based specifications, but it appears incremental as it builds on existing adversarial data methods.
The paper tackles the problem of poor generalization in neural program synthesis when trained on randomly generated input-output examples, showing that current state-of-the-art methods suffer from this issue and that existing countermeasures are insufficient. It proposes an adversarial approach to control bias in synthetic data distributions, which outperforms current methods.
Program synthesis is the task of automatically generating a program consistent with a given specification. A natural way to specify programs is to provide examples of desired input-output behavior, and many current program synthesis approaches have achieved impressive results after training on randomly generated input-output examples. However, recent work has discovered that some of these approaches generalize poorly to data distributions different from that of the randomly generated examples. We show that this problem applies to other state-of-the-art approaches as well and that current methods to counteract this problem are insufficient. We then propose a new, adversarial approach to control the bias of synthetic data distributions and show that it outperforms current approaches.