Method Drift›Mixture-of-experts routing
FlexOlmo
FlexOlmo: Open Language Models for Flexible Data UseMixture-of-experts routing · first seen Jul 9, 2025
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites FlexOlmo as a baseline.
“freezing shared (non-FFN) parameters during expert training (as done in shi2025flexolmoopenlanguagemodels) significantly degrades performance in our setting”
— Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts“reliance on similarity-based proxy selection often produces redundant and narrowly concentrated proxies, limiting coverage of domain-relevant modes and weakening router supervision”
— MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
Beaten on benchmarks
Head-to-head results where a newer method reports beating FlexOlmo. Values are copied from the source paper's tables — verify against the cited paper.
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [CLIP ViT-B/32]
94.52 vs 92.92
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [CLIP ViT-B/16]
96.24 vs 93.53
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [LLaMA-3.2-3B]
74.42 vs 72.50
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [LLaMA-3.1-8B]
81.59 vs 77.46
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- MetaMoEMetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts UnificationMay 14, 2026
- Apr 20, 2026
- BERT-MoE FrameworkAspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User ReviewsFeb 13, 2026
- null experts within token-choice MoEImproving MoE Compute Efficiency by Composing Weight and Data SparsityJan 21, 2026
- MixtureKitMixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts ModelsDec 13, 2025
- ERMoEERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable SpecializationNov 14, 2025
- Dirichlet-Prior Shaping Loss (DPSL)Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEsOct 1, 2025
- Symphony-MoESymphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-ExpertsSep 23, 2025