Method Drift›Mixture-of-experts routing
DISP-LLM
DISP-LLM: Dimension-Independent Structural Pruning for Large Language ModelsMixture-of-experts routing · first seen Oct 15, 2024
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 2 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating DISP-LLM. Values are copied from the source paper's tables — verify against the cited paper.
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats DISP-LLM · Average [LLaMA-2 13B @ 70% active parameters]
68.26 vs 63.07
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats DISP-LLM · Average [LLaMA-2 13B @ 60% active parameters]
65.00 vs 60.04
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats DISP-LLM · Average [LLaMA-2 13B @ 50% active parameters]
61.13 vs 54.50
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats DISP-LLM · Average [Qwen-2.5 14B @ 50% active parameters]
61.28 vs 59.04
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning
ToMoE (Ours) beats DISP-LLM · Average [Qwen-2.5 14B @ 40% active parameters]
56.06 vs 52.96
- DOT-MoE: Differentiable Optimal Transport for MoEfication
DOT-MoE beats DISP-LLM · Avg. [LLaMA-2 7B, 3.49B params]
61.5 vs 57.4
- DOT-MoE: Differentiable Optimal Transport for MoEfication
DOT-MoE beats DISP-LLM · Avg. [LLaMA-3 8B, 3.80B params]
59.8 vs 52.7
- DOT-MoE: Differentiable Optimal Transport for MoEfication
DOT-MoE beats DISP-LLM · Avg. [Qwen2.5 7B, 4.76B params]
72.3 vs 66.7
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 1, 2026
- May 18, 2026
- Feb 17, 2026
- Mixture-of-Experts (MoE) AdaptationUnderstanding and Harnessing Sparsity in Unified Multimodal ModelsDec 2, 2025
- Elastic Mixture-of-Experts (EMoE)Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-ExpertsSep 26, 2025