LGNov 19, 2025

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

arXiv:2511.15694v12 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the impact of quantization on reinforcement learning for large reasoning models, which is incremental as it builds on existing quantization techniques.

The study investigated how quantization affects large reasoning models trained via reinforcement learning, finding that quantization-aware RL training harmed performance while post-training quantization and QLoRA methods yielded better results on mathematical benchmarks.

Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes