Living systematic review
Mixture-of-experts routing
Scaling LLM capacity with sparsely-activated experts — routing, load balancing, and fine-grained expert design.
655 papers · 1,456 critique receipts · 4,254 benchmark results · updated Jun 18, 2026
Most-superseded baselines
Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.
- 2Switch Transformer· Switch TransformerSwitch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
5 papers critique it · 7 beat it on benchmarks
- 4HydraLoRA· HydraLoRAHydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
1 papers critique it · 8 beat it on benchmarks
- 5Fiddler· MC-SMoEFiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
7 papers critique it · 2 beat it on benchmarks
- 6LoRAMoE· HydraLoRALoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
3 papers critique it · 5 beat it on benchmarks
- 9ReMoE· ReMoEReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
3 papers critique it · 4 beat it on benchmarks
- 10Soft MoE· ReMoEFrom Sparse to Soft Mixtures of Experts
3 papers critique it · 4 beat it on benchmarks
- 12Tutel· Switch TransformerTutel: Adaptive Mixture-of-Experts at Scale
6 papers critique it · 1 beat it on benchmarks
Sub-problems
Methods that compete on the same benchmarks cluster into distinct sub-problems.
Switch Transformer · 143 methods
Switch Transformer · Tutel · X-MoE · DeepSpeed-MoE · GShard · FlexMoE
Expert Choice · 34 methods
Expert Choice · MoE-LLaVA · ST-MoE · Loss-free balancing · auxiliary losses · MoCLE
U-Mamba · 28 methods
U-Mamba · Vision Transformers · DeblurGAN-v2 · GLARE · Retinexformer · VQCNIR
The frontier
Recent methods not yet superseded in the knowledge base.
- PADDPADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student LearningJun 9, 2026
- Jun 4, 2026
- Jun 1, 2026
- GC-MoEGC-MoE: Genomics-Guided Cell-Type-Specific Mixture of Experts for Histology-Based Single-Cell Spatial TranscriptomicsJun 1, 2026
- Jun 1, 2026
- Jun 1, 2026
- May 31, 2026
- May 30, 2026
- May 30, 2026
- May 29, 2026
- May 27, 2026
- MoE-to-dense conversion frameworkPruning and Distilling Mixture-of-Experts into Dense Language ModelsMay 27, 2026