Is ShortGPT superseded?

ShortGPT (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 0 paper(s) critique it, 2 beat it on benchmarks — #88 of 1370 most-superseded. Sub-problem: cluster led by LLaMA-MoE. Newer alternatives in the same sub-problem include DOT-MoE, ZEDA, ExpertWeaver, Mixture-of-Experts (MoE) Adaptation, Elastic Mixture-of-Experts (EMoE).

Method Drift›Mixture-of-experts routing

Superseded baseline#88 of 1,370 most-superseded

ShortGPT

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Mixture-of-experts routing · first seen Mar 6, 2024

superseded — cited as a baseline and beaten by newer methods

0 papers critique it · 2 beat it on benchmarks

Beaten on benchmarks

Head-to-head results where a newer method reports beating ShortGPT. Values are copied from the source paper's tables — verify against the cited paper.

ToMoE (Ours) beats ShortGPT · Average [LLaMA-2 7B @ 60% active parameters]
60.72 vs 47.07
ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
DOT-MoE beats ShortGPT · Avg. [Qwen2.5 7B, 4.76B params]
72.3 vs 45.4
DOT-MoE: Differentiable Optimal Transport for MoEfication

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

DOT-MoE DOT-MoE: Differentiable Optimal Transport for MoEfication
Jun 1, 2026
ZEDA Post-Trained MoE Can Skip Half Experts via Self-Distillation
May 18, 2026
ExpertWeaver ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
Feb 17, 2026
Mixture-of-Experts (MoE) Adaptation Understanding and Harnessing Sparsity in Unified Multimodal Models
Dec 2, 2025
Elastic Mixture-of-Experts (EMoE)Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Sep 26, 2025