PEER (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 3 beat it on benchmarks — #22 of 1370 most-superseded. Sub-problem: cluster led by ReMoE. Newer alternatives in the same sub-problem include ProbMoE, Grouter, DECO, AIR-MoE, SPHERE.

Superseded baseline#22 of 1,370 most-superseded

PEER

Mixture-of-experts routing

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PEER as a baseline.

“Limited expressivity: existing designs (e.g., PEER~he2024peer) reduce experts to static parameter vectors. This restricts the expert computation to linear vector aggregation, stripping away the token-dependent nonlinear transformations (e.g., MLP projections) essential for modeling complex linguistic dependencies.”
— OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
“Prior work in MoE research addresses this challenge either by restricting routing to predefined groups (hierarchical MoE) or by imposing structural constraints on expert representations (He et al., 2024). While these approaches reduce computational costs, they either limit routing flexibility or introduce restrictive assumptions that typically degrade performance.”
— Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Beaten on benchmarks

Head-to-head results where a newer method reports beating PEER. Values are copied from the source paper's tables — verify against the cited paper.

OmniMoE beats PEER · Avg [6.4B-A1.7B models]
50.9 vs 48.9
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
MoE-X beats PEER · Reconstruction [Mixture-of-Experts]
0.840 vs 0.426
Mixture of Experts Made Intrinsically Interpretable
AIR beats PEER · PPL [Small / WikiText-103]
21.82 vs 22.25
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Small / C5]
131.81 vs 145.32
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Small / OpenWebText2]
32.14 vs 33.10
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / WikiText-103]
18.62 vs 18.71
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / C5]
30.39 vs 31.60
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Medium / OpenWebText2]
20.51 vs 21.30
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Large / C5]
41.25 vs 45.37
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts
AIR beats PEER · PPL [Large / OpenWebText2]
16.65 vs 17.88
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.