LGAIJul 15, 2021

A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis

arXiv:2107.07373v23 citations
Originality Synthesis-oriented
AI Analysis

This work addresses mathematical reasoning for AI systems, but it is incremental as it applies existing reinforcement learning methods to a new dataset format.

The authors tackled the problem of mathematical reasoning by converting the DeepMind Mathematics Dataset into a reinforcement learning environment for program synthesis, where models learned to construct compute graphs that yield correct answers with positive rewards.

We convert the DeepMind Mathematics Dataset into a reinforcement learning environment by interpreting it as a program synthesis problem. Each action taken in the environment adds an operator or an input into a discrete compute graph. Graphs which compute correct answers yield positive reward, enabling the optimization of a policy to construct compute graphs conditioned on problem statements. Baseline models are trained using Double DQN on various subsets of problem types, demonstrating the capability to learn to correctly construct graphs despite the challenges of combinatorial explosion and noisy rewards.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes