Neural Program Synthesis with Priority Queue Training
This addresses program synthesis for AI and software engineering by improving efficiency and readability, though it is incremental as it builds on existing iterative optimization methods.
The paper tackled the problem of program synthesis with reward maximization by introducing a priority queue training (PQT) algorithm that iteratively trains an RNN on top programs and synthesizes new ones, significantly outperforming genetic algorithm and reinforcement learning baselines on a Turing complete language, and enabling synthesis of short, human-readable programs with a length penalty.
We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.