LG AIJul 15, 2021

A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis

arXiv:2107.07373v25.53 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses mathematical reasoning for AI systems, but it is incremental as it applies existing reinforcement learning methods to a new dataset format.

The authors tackled the problem of mathematical reasoning by converting the DeepMind Mathematics Dataset into a reinforcement learning environment for program synthesis, where models learned to construct compute graphs that yield correct answers with positive rewards.

We convert the DeepMind Mathematics Dataset into a reinforcement learning environment by interpreting it as a program synthesis problem. Each action taken in the environment adds an operator or an input into a discrete compute graph. Graphs which compute correct answers yield positive reward, enabling the optimization of a policy to construct compute graphs conditioned on problem statements. Baseline models are trained using Double DQN on various subsets of problem types, demonstrating the capability to learn to correctly construct graphs despite the challenges of combinatorial explosion and noisy rewards.

View on arXiv PDF Code

Similar