Program Synthesis Through Reinforcement Learning Guided Tree Search
This addresses program synthesis for developers or researchers by offering an incremental improvement over existing search and learning approaches.
The paper tackles program synthesis by framing it as a reinforcement learning problem combined with a priority search tree, and demonstrates that this hybrid method outperforms baselines including pure RL and a state-of-the-art search method on a RISC-V assembly subset.
Program Synthesis is the task of generating a program from a provided specification. Traditionally, this has been treated as a search problem by the programming languages (PL) community and more recently as a supervised learning problem by the machine learning community. Here, we propose a third approach, representing the task of synthesizing a given program as a Markov decision process solvable via reinforcement learning(RL). From observations about the states of partial programs, we attempt to find a program that is optimal over a provided reward metric on pairs of programs and states. We instantiate this approach on a subset of the RISC-V assembly language operating on floating point numbers, and as an optimization inspired by search-based techniques from the PL community, we combine RL with a priority search tree. We evaluate this instantiation and demonstrate the effectiveness of our combined method compared to a variety of baselines, including a pure RL ablation and a state of the art Markov chain Monte Carlo search method on this task.