Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis
This addresses the problem of data scarcity for RL training in code synthesis, offering an incremental improvement over prior RL-based approaches.
The paper tackles the challenge of obtaining unit test data for reinforcement learning in code synthesis by introducing an automatic method to generate function signatures and associated unit tests, and proposes an actor-critic RL training scheme that improves a pre-trained model's performance by up to 9.9% over the base model and up to 4.3% over existing RL methods.
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.