LGOct 3, 2025

RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models

arXiv:2510.03515v11 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for researchers and practitioners using RL with small language models, but it is incremental as it builds on existing RL methods.

The authors tackled the problem of reinforcement learning (RL) being resource-intensive for fine-tuning small language models, and their RAPID algorithm reduced running time by 11%-34% on benchmarks while maintaining similar or better accuracy.

Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount of time to train. We propose RAPID, a novel RL algorithm that can substantially reduce the running time of RL. Our key insight is that RL tends to be costly due to the need to perform both inference and backpropagation during training. To maximize use of computational resources, our algorithm performs inference in large batches, and then performs off-policy policy gradient updates in mini-batches. For off-policy updates, we incorporate group advantage estimation into the policy gradient algorithm, and derive an importance weighted estimator to correct for the bias arising from off-policy learning. Our experiments demonstrate that our algorithm can reduce running time by 11%-34% on three benchmarks compared to state-of-the-art RL algorithms while maintaining similar or better accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes