Method DriftMixture-of-experts routing

Superseded baseline#22 of 1,370 most-superseded

PEER

Mixture-of-experts routing

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PEER as a baseline.

  • Limited expressivity: existing designs (e.g., PEER~he2024peer) reduce experts to static parameter vectors. This restricts the expert computation to linear vector aggregation, stripping away the token-dependent nonlinear transformations (e.g., MLP projections) essential for modeling complex linguistic dependencies.
    OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
  • Prior work in MoE research addresses this challenge either by restricting routing to predefined groups (hierarchical MoE) or by imposing structural constraints on expert representations (He et al., 2024). While these approaches reduce computational costs, they either limit routing flexibility or introduce restrictive assumptions that typically degrade performance.
    Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Beaten on benchmarks

Head-to-head results where a newer method reports beating PEER. Values are copied from the source paper's tables — verify against the cited paper.

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.