Method Drift

Living systematic review

Mixture-of-experts routing

Scaling LLM capacity with sparsely-activated experts — routing, load balancing, and fine-grained expert design.

655 papers · 1,456 critique receipts · 4,254 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

  1. 1
    MC-SMoE· MC-SMoE

    8 papers critique it · 12 beat it on benchmarks

  2. 3
    NAEE· MC-SMoE

    5 papers critique it · 6 beat it on benchmarks

  3. 4
    HydraLoRA· HydraLoRA
    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

    1 papers critique it · 8 beat it on benchmarks

  4. 5
    Fiddler· MC-SMoE
    Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

    7 papers critique it · 2 beat it on benchmarks

  5. 7
    BTX· BTX

    2 papers critique it · 5 beat it on benchmarks

  6. 8
    HC-SMoE· MC-SMoE

    3 papers critique it · 4 beat it on benchmarks

  7. 9
    ReMoE· ReMoE
    ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

    3 papers critique it · 4 beat it on benchmarks

  8. 10
    Soft MoE· ReMoE
    From Sparse to Soft Mixtures of Experts

    3 papers critique it · 4 beat it on benchmarks

  9. 11

    5 papers critique it · 2 beat it on benchmarks

  10. 12
    Tutel· Switch Transformer
    Tutel: Adaptive Mixture-of-Experts at Scale

    6 papers critique it · 1 beat it on benchmarks

Sub-problems

Methods that compete on the same benchmarks cluster into distinct sub-problems.

MC-SMoE · 220 methods

MC-SMoE · NAEE · Fiddler · HC-SMoE · Mixtral-Offloading · SEER-MoE

HydraLoRA · 145 methods

HydraLoRA · LoRAMoE · MoELoRA · MoLE · MoLA · LEMoE

ReMoE · 149 methods

ReMoE · Soft MoE · DeepSeekMoE · PEER · ESFT · Lory

Switch Transformer · 143 methods

Switch Transformer · Tutel · X-MoE · DeepSpeed-MoE · GShard · FlexMoE

BTX · 80 methods

BTX · Upcycling · CLIP-MoE · FlexOlmo · Branch-Train-Merge · BTM

LLaMA-MoE · 60 methods

LLaMA-MoE · AdaMoE · DISP-LLM · ShortGPT · CMoE · static pruning

SteerMoE · 48 methods

SteerMoE · GSPO · GRPO · Mixtral · DPO · RICE

Expert Choice · 34 methods

Expert Choice · MoE-LLaVA · ST-MoE · Loss-free balancing · auxiliary losses · MoCLE

MoEQuant · 30 methods

MoEQuant · PMQ · Hessian · ODP · EAQuant · uniform bit-width quantization

The frontier

Recent methods not yet superseded in the knowledge base.