FlexRound
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training QuantizationLLM quantization · first seen Jun 1, 2023
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites FlexRound as a baseline.
“FlexRound incurs considerable performance degradation on the massive multitask language understanding (MMLU) benchmark”
— LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Beaten on benchmarks
Head-to-head results where a newer method reports beating FlexRound. Values are copied from the source paper's tables — verify against the cited paper.
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · IFEval (greedy) [Qwen2.5-7B, W4]
71.35 vs 69.50
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · MATH500 (greedy) [Qwen2.5-7B, W4]
73.4 vs 72.6
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · IFEval (greedy) [Qwen2.5-7B, W3g128]
67.84 vs 66.54
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · MATH500 (greedy) [Qwen2.5-7B, W3g128]
68.0 vs 65.6
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · IFEval (greedy) [Qwen2.5-14B, W4]
78.00 vs 77.82
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · MATH500 (greedy) [Qwen2.5-14B, W4]
77.2 vs 76.4
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · IFEval (greedy) [Qwen2.5-14B, W3g128]
77.08 vs 75.05
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ (Ours) beats FlexRound · MATH500 (greedy) [Qwen2.5-14B, W3g128]
71.6 vs 69.6
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ beats FlexRound · IFEval (greedy) [Llama 3.1 8B, W4, FlexRound]
72.09 vs 70.24
- LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
FlexRound+LFQ beats FlexRound · GSM8K (greedy) [Llama 3.1 8B, W4, FlexRound]
81.80 vs 81.35
- LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
LRQ (Ours) beats FlexRound · MMLU Average Accuracy [Llama 2 7B, 4/8/8]
45.36 vs 45.14
- LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
LRQ (Ours) beats FlexRound · MMLU Average Accuracy [Llama 2 13B, 4/8/8]
54.49 vs 53.77