Latent Programmer: Discrete Latent Codes for Program Synthesis
This work provides a more efficient search mechanism for program synthesis, which is beneficial for developers and researchers working on automated code generation, by improving accuracy.
This paper addresses the challenge of searching large output spaces in sequence learning tasks like program synthesis by learning compact, discrete latent codes. The Latent Programmer, a method based on this, significantly improves synthesis accuracy in string transformation and natural language program generation by first predicting a discrete latent code and then generating the program.
In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output sequences. We propose to learn representations of the outputs that are specifically meant for search: rich enough to specify the desired output but compact enough to make search more efficient. Discrete latent codes are appealing for this purpose, as they naturally allow sophisticated combinatorial search strategies. The latent codes are learned using a self-supervised learning principle, in which first a discrete autoencoder is trained on the output sequences, and then the resulting latent codes are used as intermediate targets for the end-to-end sequence prediction task. Based on these insights, we introduce the \emph{Latent Programmer}, a program synthesis method that first predicts a discrete latent code from input/output examples, and then generates the program in the target language. We evaluate the Latent Programmer on two domains: synthesis of string transformation programs, and generation of programs from natural language descriptions. We demonstrate that the discrete latent representation significantly improves synthesis accuracy.