LGAIMay 8

LoopQ: Quantization for Recursive Transformers

arXiv:2605.1634361.8
AI Analysis

Enables efficient quantization of recursive Transformer models for parameter-efficient language modeling, addressing a previously unstudied problem.

LoopQ addresses the fragility of looped language models under post-training quantization, achieving a 68.8% improvement in downstream accuracy and 87.7% reduction in perplexity under W4A4 quantization compared to the strongest static PTQ baseline.

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8% and reduces average perplexity by 87.7% compared with the strongest static PTQ baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes