LGAIJan 29

MixQuant: Pushing the Limits of Block Rotations in Post-Training Quantization

arXiv:2601.22347v11 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient quantization for large language models, offering a method to enhance performance without inference overhead, though it is incremental as it builds on existing block rotation techniques.

The paper tackles the problem of outlier suppression in post-training quantization by analyzing block rotations and introduces MixQuant, a framework that redistributes activation mass via permutations to improve accuracy, recovering up to 90% of full-vector rotation perplexity for INT4 quantization of Llama3 1B with block size 16.

Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of full-vector rotations, the effect of block structure on outlier suppression remains poorly understood. To fill this gap, we present the first systematic, non-asymptotic analysis of outlier suppression for block Hadamard rotations. Our analysis reveals that outlier suppression is fundamentally limited by the geometry of the input vector. In particular, post-rotation outliers are deterministically minimized when the pre-rotation $\ell_1$ norm mass is evenly distributed across blocks. Guided by these insights, we introduce MixQuant, a block rotation-aware PTQ framework that redistributes activation mass via permutations prior to rotation. We propose a greedy mass diffusion algorithm to calibrate permutations by equalizing the expected blockwise $\ell_1$ norms. To avoid adding inference overhead, we identify permutation-equivariant regions in transformer architectures to merge the resulting permutations into model weights before deployment. Experiments show that MixQuant consistently improves accuracy across all block sizes, recovering up to 90% of the full-vector rotation perplexity when quantizing Llama3 1B to INT4 with block size 16, compared to 46% without permutations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes