Living systematic review

Mixture-of-experts routing

Scaling LLM capacity with sparsely-activated experts — routing, load balancing, and fine-grained expert design.

655 papers · 1,456 critique receipts · 4,254 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

1
MC-SMoE· MC-SMoE
8 papers critique it · 12 beat it on benchmarks
2
Switch Transformer· Switch Transformer
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
5 papers critique it · 7 beat it on benchmarks
3
NAEE· MC-SMoE
5 papers critique it · 6 beat it on benchmarks
4
HydraLoRA· HydraLoRA
HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
1 papers critique it · 8 beat it on benchmarks
5
Fiddler· MC-SMoE
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
7 papers critique it · 2 beat it on benchmarks
6
LoRAMoE· HydraLoRA
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
3 papers critique it · 5 beat it on benchmarks
7
BTX· BTX
2 papers critique it · 5 beat it on benchmarks
8
HC-SMoE· MC-SMoE
3 papers critique it · 4 beat it on benchmarks
9
ReMoE· ReMoE
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
3 papers critique it · 4 beat it on benchmarks
10
Soft MoE· ReMoE
From Sparse to Soft Mixtures of Experts
3 papers critique it · 4 beat it on benchmarks
11
Mixtral-Offloading· MC-SMoE
5 papers critique it · 2 beat it on benchmarks
12
Tutel· Switch Transformer
Tutel: Adaptive Mixture-of-Experts at Scale
6 papers critique it · 1 beat it on benchmarks

Methods that compete on the same benchmarks cluster into distinct sub-problems.

MC-SMoE · 220 methods

HydraLoRA · 145 methods

ReMoE · 149 methods

Switch Transformer · 143 methods

BTX · 80 methods

LLaMA-MoE · 60 methods

SteerMoE · 48 methods

Expert Choice · 34 methods

MoEQuant · 30 methods

U-Mamba · 28 methods

Recent methods not yet superseded in the knowledge base.