Is QServe superseded?

QServe (LLM quantization): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 0 beat it on benchmarks — #47 of 80 most-superseded. Sub-problem: cluster led by QServe. Newer alternatives in the same sub-problem include APEX4.

Method Drift›LLM quantization

Superseded baseline#47 of 80 most-superseded

QServe

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

LLM quantization · first seen May 7, 2024

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites QServe as a baseline.

“QServe~Qserve concludes that W4A4 cannot deliver speedup on Ampere and retreats to W4A8.”
— APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
“the state-of-the-art W4A8 GEMM implementation~lin2024qserve fails to meet expectations: it does not outperform higher-precision methods like W8A8 in memory-bound scenarios and is significantly slower than W8A8 and even FP16 in compute-bound regimes”
— LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

APEX4 APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
Jun 7, 2026