Method Drift›Mixture-of-experts routing
Superseded baseline#81 of 1,370 most-superseded
Hessian
Mixture-of-experts routing
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 2 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating Hessian. Values are copied from the source paper's tables — verify against the cited paper.
- Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Router norm + Max var (Ours) beats Hessian · Avg. [Mixtral 8x7B, 2.5 bits/expert]
68.38 vs 67.18
- MC#: Mixture Compressor for Mixture-of-Experts Large Models
PMQ beats Hessian · Avg. (%) [Mixtral 8×7b at 2.54-bit]
67.50 vs 67.18
- MC#: Mixture Compressor for Mixture-of-Experts Large Models
PMQ beats Hessian · Avg. (%) [DeepSeek-VL2-L at 2.57-bit]
70.60 vs 67.79
- MC#: Mixture Compressor for Mixture-of-Experts Large Models
PMQ beats Hessian · Avg. (%) [DeepSeek-VL2-S at 2.58-bit]
63.66 vs 61.79
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 22, 2026
- May 21, 2026
- KBVQ-MoEKBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language ModelsJan 30, 2026
- Oct 13, 2025