Superseded baseline#47 of 80 most-superseded
QServe
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM ServingLLM quantization · first seen May 7, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites QServe as a baseline.
“QServe~Qserve concludes that W4A4 cannot deliver speedup on Ampere and retreats to W4A8.”
— APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing“the state-of-the-art W4A8 GEMM implementation~lin2024qserve fails to meet expectations: it does not outperform higher-precision methods like W8A8 in memory-bound scenarios and is significantly slower than W8A8 and even FP16 in compute-bound regimes”
— LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 7, 2026