Superseded baseline#30 of 80 most-superseded
ZeroQuant
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale TransformersLLM quantization · first seen Jun 4, 2022
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ZeroQuant as a baseline.
“ZeroQuant incurs severe accuracy degradation for an open-source LLM”
— LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices“ZeroQuant requires 3.1 hours on a single A100 GPU to quantize an LLM with 1.3 billion parameters.”
— SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs“However, both LLM.int8() and ZeroQuant are not efficient for quantizing LLMs to extreme low-percision number formats such as 3-bit integers.”
— AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs