Method Drift›Mixture-of-experts routing
FlexMoE
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device PlacementMixture-of-experts routing · first seen Apr 8, 2023
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites FlexMoE as a baseline.
“each re-balancing introduces significant overhead due to copying optimizer state, limiting the frequency that rebalancing can be performed and thus the efficacy of its adaptive replication.”
— SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling“still cannot resolve the additional communication of expert parameters”
— LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
Beaten on benchmarks
Head-to-head results where a newer method reports beating FlexMoE. Values are copied from the source paper's tables — verify against the cited paper.
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [48-IHM]
49.18 vs 35.29
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [48-IHM]
85.24 vs 80.45
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [LOS]
61.35 vs 56.96
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [LOS]
78.22 vs 74.81
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [25-PHE]
28.67 vs 24.61
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [25-PHE]
74.56 vs 71.57
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [0% missing]
51.83 vs 51.50
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [0% missing]
80.62 vs 79.38
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [10% missing]
50.89 vs 49.92
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [10% missing]
78.06 vs 77.07
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · F1 [20% missing]
50.58 vs 49.16
- Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate
ConfSMoE beats FlexMoE · AUC [20% missing]
77.17 vs 73.80
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- ConceptM$^3$oEConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational PathologyMay 23, 2026
- DisagMoEDisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe ParallelismMay 10, 2026
- PiperPiper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid ParallelismMay 6, 2026
- GRACE-MoEGRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE InferenceMay 6, 2026
- Apr 21, 2026
- Feb 12, 2026
- Multi-Head LatentMoE and Head Parallel (HP)Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE ParallelismFeb 4, 2026
- Jan 29, 2026
- Rasterized Steered Mixture of ExpertsRasterized Steered Mixture of Experts for Efficient 2D Image RegressionOct 7, 2025
- Sep 30, 2025
- Sep 24, 2025