QuaRot
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMsLLM quantization · first seen Mar 30, 2024
heavily superseded — a standard baseline that newer methods routinely beat
7 papers critique it · 12 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites QuaRot as a baseline.
“QuaRot reporting an accuracy loss of approximately 3.5 points”
— OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension“Interestingly, we can prove analytically and show empirically that rotations improve MXFP4 accuracy, but hurt NVFP4 accuracy when coupled with standard Round-to-Nearest (RTN) quantization.”
— Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization“Yet, these rotations introduce quadratic complexity, which offsets the potential acceleration.”
— ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers“QuaRot, SpinQuant, and ButterflyQuant do not engage with directly [the regime of per-head q_norm/RoPE compatibility failures]”
— Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization“However, these methods operate primarily along the feature dimension and ignore correlations across the sequence dimension.”
— STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization“QuaRot fails on Qwen models smaller than 14B, suggesting that naive rotation alone is insufficient to suppress quantization error in the presence of severe outliers in small models”
— OffQ: Taming Structured Outliers in LLM Quantization by Offsetting“However, these predetermined rotations cannot adapt to specific models.”
— ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Beaten on benchmarks
Head-to-head results where a newer method reports beating QuaRot. Values are copied from the source paper's tables — verify against the cited paper.
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats QuaRot · Avg. [W4A4]
65.12 vs 51.83
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution
QuantVSR beats QuaRot · PSNR [REDS4, W4A4]
23.31 vs 20.21
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution
QuantVSR beats QuaRot · PSNR [SPMCS, W4A4]
22.76 vs 20.16
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution
QuantVSR beats QuaRot · PSNR [MVSR4x, W4A4]
21.18 vs 21.00
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats QuaRot · Avg. [W4A4 (4-bit weight and activation) on LLADA-8B]
57.07 vs 51.03
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats QuaRot · Avg. [W4A4 (4-bit weight and activation) on LLADA-1.5-8B]
66.93 vs 61.06
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats QuaRot · Avg. [W4A4 (4-bit weight and activation) on DREAM-7B]
63.59 vs 58.85
- SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
SpecQuant beats QuaRot · 0-shot^9 [4-4-16]
64.75 vs 61.69
- SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
SpecQuant beats QuaRot · 0-shot^9 [4-4-4]
64.75 vs 61.38
- Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
MicroRotated-GPTQ beats QuaRot · Avg [MXFP4 W4A4]
73.65 vs 62.90
- What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
Reasoning-QAT beats QuaRot · Avg. [Qwen3-0.6B W4A4KV4]
21.44 vs 4.84
- What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
Reasoning-QAT beats QuaRot · Avg. [R1-1.5B W4A4KV4]
41.31 vs 2.11
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 5, 2026
- FAIR-CalibFAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language ModelsJun 4, 2026
- May 25, 2026
- May 11, 2026
- Activation Residual Hessian Quantization (ARHQ)Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM QuantizationApr 30, 2026
- Apr 20, 2026
- Apr 14, 2026
- Mar 26, 2026
- Jan 29, 2026
- Reasoning-QATWhat Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic StudyJan 21, 2026
- Dec 3, 2025
- adaptive transformation selection frameworkAdaptive Layer-Wise Transformations for Post-Training Quantization of Large Language ModelsNov 21, 2025