X-MoE (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 3 beat it on benchmarks — #18 of 1370 most-superseded. Sub-problem: cluster led by Switch Transformer. Newer alternatives in the same sub-problem include ConceptM$^3$oE, DisagMoE, Piper, GRACE-MoE, ReaLB.

Superseded baseline#18 of 1,370 most-superseded

X-MoE

XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection

Mixture-of-experts routing · first seen Feb 27, 2024

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites X-MoE as a baseline.

“pure cosine scoring eliminates magnitude cues, whereas SIPS retains them with bounded influence.”
— L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
“However, for 500B+ models, X-MoE achieves only 5% MFU.”
— Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
“addressed representation collapse by routing in a low-dimensional space, but experts still operated on high-dimensional inputs.”
— Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism

Beaten on benchmarks

Head-to-head results where a newer method reports beating X-MoE. Values are copied from the source paper's tables — verify against the cited paper.

PathB4-MoE beats X-MoE · Avg. [Path-Constrained Routing]
49.62 vs 48.44
Path-Constrained Mixture-of-Experts
L2R (SIPS) beats X-MoE · Overall [OLMoE 64 experts top-k=8]
43.4 vs 42.1
L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
Similarity-Aware SMoE beats X-MoE · Test PPL [K=2, Clean Wikitext-103]
32.03 vs 34.49
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Similarity-Aware SMoE beats X-MoE · Test PPL [K=2, Attacked Wikitext-103]
39.92 vs 42.96
Improving Routing in Sparse Mixture of Experts with Graph of Tokens

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.