CLAILGJul 17, 2023

A mixed policy to improve performance of language models on math problems

arXiv:2307.08767v1h-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses the issue of deterministic math problem-solving for language model users, but it is incremental as it builds on existing methods with a specific hybrid approach.

The authors tackled the problem of language models generating wrong answers in math reasoning by proposing a mixed policy exploration approach using reinforcement learning, achieving a performance gain of over 2% on the GSM8K dataset with GPT-2.

When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes