Method Drift›Mixture-of-experts routing
MoELoRA
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language ModelsMixture-of-experts routing · first seen Feb 20, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 5 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites MoELoRA as a baseline.
“These methods still exhibit substantial performance degradation on earlier tasks as training progresses”
— SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
Beaten on benchmarks
Head-to-head results where a newer method reports beating MoELoRA. Values are copied from the source paper's tables — verify against the cited paper.
- SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME (MOMO) beats MoELoRA · Average [all tasks TriGap]
46.53 vs 44.45
- SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME (MOMO) beats MoELoRA · Accuracy [all tasks CoIN]
66.82 vs 50.58
- SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME (MOMO) beats MoELoRA · Average [all tasks UCIT]
67.12 vs 52.06
- On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MFN [main]
57.03 vs 43.93
- On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MAA [main]
57.70 vs 43.92
- On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MFN [+ ASD]
60.55 vs 43.93
- On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MAA [+ ASD]
62.26 vs 43.92
- LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
LiMELoRA beats MoELoRA · Commonsense Reasoning [Commonsense Reasoning]
84.98 vs 84.08
- LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
LLaVA-CMoE beats MoELoRA · Mean [Immediate]
62.81 vs 62.10
- LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
LLaVA-CMoE beats MoELoRA · Mean [Last]
59.23 vs 44.24
- PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning
PASs-MoE beats MoELoRA · Acc [Math QA]
49.52 vs 42.98
- PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning
PASs-MoE beats MoELoRA · Acc [Arts VQA]
43.22 vs 35.89
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- PARAMΔ Integration into Upcycled MoEA Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoEMay 18, 2026
- MEMIT-like framework for MoEScalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured UpdatesMay 15, 2026
- May 11, 2026
- May 8, 2026
- Apr 28, 2026
- CoGR-MoECoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question AnsweringApr 18, 2026
- Apr 2, 2026
- On Token's DilemmaOn Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language ModelsMar 29, 2026
- Mixture-of-Experts (MoE) and Mixture-of-Linear-Experts (MoLE) architectures for MLIPsScaling Machine Learning Interatomic Potentials with Mixtures of ExpertsMar 9, 2026
- Mar 5, 2026
- Feb 13, 2026
- Multiscale Interaction Mixture of Experts (MI-MoE)Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property PredictionJan 19, 2026