Is FlexMoE superseded?

FlexMoE (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 1 beat it on benchmarks — #55 of 1370 most-superseded. Sub-problem: cluster led by Switch Transformer. Newer alternatives in the same sub-problem include ConceptM$^3$oE, DisagMoE, Piper, GRACE-MoE, ReaLB.

Method Drift›Mixture-of-experts routing

Superseded baseline#55 of 1,370 most-superseded

FlexMoE

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Mixture-of-experts routing · first seen Apr 8, 2023

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites FlexMoE as a baseline.

“each re-balancing introduces significant overhead due to copying optimizer state, limiting the frequency that rebalancing can be performed and thus the efficacy of its adaptive replication.”
— SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling
“still cannot resolve the additional communication of expert parameters”
— LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training

Beaten on benchmarks

Head-to-head results where a newer method reports beating FlexMoE. Values are copied from the source paper's tables — verify against the cited paper.

ConfSMoE beats FlexMoE · F1 [48-IHM]
49.18 vs 35.29
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [48-IHM]
85.24 vs 80.45
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [LOS]
61.35 vs 56.96
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [LOS]
78.22 vs 74.81
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [25-PHE]
28.67 vs 24.61
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [25-PHE]
74.56 vs 71.57
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [0% missing]
51.83 vs 51.50
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [0% missing]
80.62 vs 79.38
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [10% missing]
50.89 vs 49.92
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [10% missing]
78.06 vs 77.07
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [20% missing]
50.58 vs 49.16
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [20% missing]
77.17 vs 73.80
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.