Method Drift›Mixture-of-experts routing
Superseded baseline#65 of 1,370 most-superseded
Branch-Train-Merge
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language ModelsMixture-of-experts routing · first seen Aug 5, 2022
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Branch-Train-Merge as a baseline.
“does not yield a single unified model (hindering downstream SFT/RLHF and incurring inference overhead)”
— MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification“While this approach makes training more efficient, its main drawback is the lack of a unified single model making it impossible to do further supervised finetuning (SFT) or reinforcement learning from human feedback (RLHF) finetuning”
— Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM“Despite its strong performance, this architecture limited the ability to further fine-tune the individual experts' components within the unified structure.”
— MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- MetaMoEMetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts UnificationMay 14, 2026
- Apr 20, 2026
- BERT-MoE FrameworkAspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User ReviewsFeb 13, 2026
- null experts within token-choice MoEImproving MoE Compute Efficiency by Composing Weight and Data SparsityJan 21, 2026
- MixtureKitMixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts ModelsDec 13, 2025
- ERMoEERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable SpecializationNov 14, 2025
- Dirichlet-Prior Shaping Loss (DPSL)Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEsOct 1, 2025
- Symphony-MoESymphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-ExpertsSep 23, 2025