Method Drift›Mixture-of-experts routing
Superseded baseline#88 of 1,370 most-superseded
ShortGPT
ShortGPT: Layers in Large Language Models are More Redundant Than You ExpectMixture-of-experts routing · first seen Mar 6, 2024
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 2 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating ShortGPT. Values are copied from the source paper's tables — verify against the cited paper.
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats ShortGPT · Average [LLaMA-2 7B @ 60% active parameters]
60.72 vs 47.07
- DOT-MoE: Differentiable Optimal Transport for MoEfication
DOT-MoE beats ShortGPT · Avg. [Qwen2.5 7B, 4.76B params]
72.3 vs 45.4
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 1, 2026
- May 18, 2026
- Feb 17, 2026
- Mixture-of-Experts (MoE) AdaptationUnderstanding and Harnessing Sparsity in Unified Multimodal ModelsDec 2, 2025
- Elastic Mixture-of-Experts (EMoE)Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-ExpertsSep 26, 2025