LG DCOct 30, 2025

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Qiaoling Chen, Zijun Liu, Peng Sun, Shenggui Li, Guoteng Wang, Ziming Liu, Yonggang Wen, Siyuan Feng, Tianwei Zhang

arXiv:2510.26475v18 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses efficiency bottlenecks in RL-based LLM adaptation, offering a practical solution for researchers and practitioners, though it is incremental as it adapts existing speculative decoding methods.

The paper tackled the problem of slow generation in reinforcement learning (RL) for large language models, identifying gaps in speculative decoding integration, and achieved up to 4.5x speedup while preserving reward convergence and training stability.

Adapting large language models (LLMs) via reinforcement learning (RL) is often bottlenecked by the generation stage, which can consume over 75\% of the training time. Speculative decoding (SD) accelerates autoregressive generation in serving systems, but its behavior under RL training remains largely unexplored. We identify three critical gaps that hinder the naive integration of SD into RL systems: diminishing speedups at large batch sizes, drafter staleness under continual actor updates, and drafter-induced policy degradation. To address these gaps, we present ReSpec, a system that adapts SD to RL through three complementary mechanisms: dynamically tuning SD configurations, evolving the drafter via knowledge distillation, and weighting updates by rollout rewards. On Qwen models (3B--14B), ReSpec achieves up to 4.5x speedup while preserving reward convergence and training stability, providing a practical solution for efficient RL-based LLM adaptation.

View on arXiv PDF

Similar