Method Drift›Mixture-of-experts routing
PEER
Mixture-of-experts routing
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PEER as a baseline.
“Limited expressivity: existing designs (e.g., PEER~he2024peer) reduce experts to static parameter vectors. This restricts the expert computation to linear vector aggregation, stripping away the token-dependent nonlinear transformations (e.g., MLP projections) essential for modeling complex linguistic dependencies.”
— OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale“Prior work in MoE research addresses this challenge either by restricting routing to predefined groups (hierarchical MoE) or by imposing structural constraints on expert representations (He et al., 2024). While these approaches reduce computational costs, they either limit routing flexibility or introduce restrictive assumptions that typically degrade performance.”
— Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
Beaten on benchmarks
Head-to-head results where a newer method reports beating PEER. Values are copied from the source paper's tables — verify against the cited paper.
- OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
OmniMoE beats PEER · Avg [6.4B-A1.7B models]
50.9 vs 48.9
- Mixture of Experts Made Intrinsically Interpretable
MoE-X beats PEER · Reconstruction [Mixture-of-Experts]
0.840 vs 0.426
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Small / WikiText-103]
21.82 vs 22.25
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Small / C5]
131.81 vs 145.32
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Small / OpenWebText2]
32.14 vs 33.10
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / WikiText-103]
18.62 vs 18.71
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / C5]
30.39 vs 31.60
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / OpenWebText2]
20.51 vs 21.30
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Large / C5]
41.25 vs 45.37
- Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Large / OpenWebText2]
16.65 vs 17.88
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 1, 2026
- May 24, 2026
- May 11, 2026
- May 6, 2026
- SPHERESPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement LearningMay 6, 2026
- bias-driven sparsification with always-active gated condenser expertsPreserving Long-Tailed Expert Information in Mixture-of-Experts TuningApr 24, 2026
- Apr 23, 2026
- Feb 10, 2026
- Feb 9, 2026
- Feb 5, 2026
- GRIP (Geometric Routing Invariance Preservation)GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router ConstraintsJan 23, 2026
- Jan 7, 2026