LGAICLJun 5

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

arXiv:2606.071169.9
Predicted impact top 22% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying large language models with low-bit quantization, OffQ addresses the bottleneck of activation outliers, enabling more accurate and efficient inference.

OffQ introduces a method to mitigate activation outliers in LLM quantization by identifying a low-dimensional outlier subspace via top-1 PCA, rotating activations to concentrate outliers into one channel, and absorbing that channel as a shared offset. This enables effective W4A4KV4 quantization with uniform precision, consistently improving model accuracy over state-of-the-art baselines across diverse LLM architectures.

Low-bit quantization has been widely adopted to accelerate the inference of large language models (LLMs) by significantly reducing computational cost and memory usage. However, activation outliers pose a major challenge to effective quantization, often leading to notable performance degradation. In this paper, we introduce OffQ, a method designed to mitigate activation outliers in low-bit quantization through a novel offsetting mechanism. Specifically, OffQ first identifies a low-dimensional outlier subspace in the activations using a proposed top-1 PCA, and then concentrates high-magnitude activations into 1 channel via rotation. OffQ then absorbs this concentrated outlier channel by converting its magnitude into a shared offset, thereby reducing the standard deviation of the activations. This offsetting strategy enables effective W4A4KV4 quantization of LLMs using deployment-friendly uniform-grid and uniform-precision quantization. Extensive experiments across diverse LLM architectures and benchmarks demonstrate that OffQ outperforms state-of-the-art baselines, consistently improving model accuracy while preserving low-bit efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes