Is FlexOlmo superseded?

FlexOlmo (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 1 beat it on benchmarks — #56 of 1370 most-superseded. Sub-problem: cluster led by BTX. Newer alternatives in the same sub-problem include MetaMoE, BAR, BERT-MoE Framework, null experts within token-choice MoE, MixtureKit.

Method Drift›Mixture-of-experts routing

Superseded baseline#56 of 1,370 most-superseded

FlexOlmo

FlexOlmo: Open Language Models for Flexible Data Use

Mixture-of-experts routing · first seen Jul 9, 2025

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites FlexOlmo as a baseline.

“freezing shared (non-FFN) parameters during expert training (as done in shi2025flexolmoopenlanguagemodels) significantly degrades performance in our setting”
— Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
“reliance on similarity-based proxy selection often produces redundant and narrowly concentrated proxies, limiting coverage of domain-relevant modes and weakening router supervision”
— MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Beaten on benchmarks

Head-to-head results where a newer method reports beating FlexOlmo. Values are copied from the source paper's tables — verify against the cited paper.

\methodName beats FlexOlmo · Average Accuracy [CLIP ViT-B/32]
94.52 vs 92.92
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [CLIP ViT-B/16]
96.24 vs 93.53
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [LLaMA-3.2-3B]
74.42 vs 72.50
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
\methodName beats FlexOlmo · Average Accuracy [LLaMA-3.1-8B]
81.59 vs 77.46
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.