LGAIJun 1

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

arXiv:2606.0221880.2
Predicted impact top 15% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners of synchronous on-policy RL (e.g., GRPO), SAGC offers a practical solution to the straggler problem that scales with group size, improving training efficiency without sacrificing model quality.

SAGC dynamically adjusts group size during synchronous on-policy RL to mitigate straggler delays, improving wall-clock efficiency and achieving competitive or better final model quality on reasoning benchmarks.

Synchronous reinforcement learning methods such as Group Relative Policy Optimization (GRPO) provide stable and reproducible on-policy training, but they are highly vulnerable to stragglers, a single unusually long rollout can delay reward computation and parameter updates for the entire group. This problem becomes more severe as group size increases, creating a tension between the benefits of larger groups and the wall-clock cost of synchronization stalls. We propose Straggler-Aware Group Control (SAGC), a dynamic group-size controller that adapts the training group online based on observed rollout behavior. SAGC formulates group-size selection as an online constrained optimization problem, seeking to retain the benefits of larger groups while controlling the long-term rate of straggler events. Across synchronous GRPO and DAPO training, and on top of both vanilla and strong engineered baselines, SAGC consistently reduces straggler incidence and improves wall-clock efficiency while achieving competitive or better training reward. We further show that these gains transfer to final model quality: SAGC is competitive with or better than the strongest static group-size baseline on downstream reasoning benchmarks, and often produces shorter outputs without any explicit length penalty. These results position dynamic group control as a practical way to make synchronous on-policy RL more efficient and robust.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes