CLAIJan 14

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

arXiv:2601.09555v13 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work provides practical guidance for adapting PTQ methods to MXFP formats, addressing a gap in low-precision quantization for LLMs, though it is incremental as it benchmarks existing algorithms rather than introducing new ones.

This paper systematically investigates post-training quantization (PTQ) for large language models (LLMs) under Microscaling Floating-Point (MXFP) formats, finding that MXFP8 achieves near-lossless performance while MXFP4 causes significant accuracy degradation, with scaling factors identified as a key error source that can be mitigated by pre-scale optimization.

Microscaling Floating-Point (MXFP) has emerged as a promising low-precision format for large language models (LLMs). Despite various post-training quantization (PTQ) algorithms being proposed, they mostly focus on integer quantization, while their applicability and behavior under MXFP formats remain largely unexplored. To address this gap, this work conducts a systematic investigation of PTQ under MXFP formats, encompassing over 7 PTQ algorithms, 15 evaluation benchmarks, and 3 LLM families. The key findings include: 1) MXFP8 consistently achieves near-lossless performance, while MXFP4 introduces substantial accuracy degradation and remains challenging; 2) PTQ effectiveness under MXFP depends strongly on format compatibility, with some algorithmic paradigms being consistently more effective than others; 3) PTQ performance exhibits highly consistent trends across model families and modalities, in particular, quantization sensitivity is dominated by the language model rather than the vision encoder in multimodal LLMs; 4) The scaling factor of quantization is a critical error source in MXFP4, and a simple pre-scale optimization strategy can significantly mitigate its impact. Together, these results provide practical guidance on adapting existing PTQ methods to MXFP quantization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes