Is MoELoRA superseded?

MoELoRA (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 1 paper(s) critique it, 5 beat it on benchmarks — #14 of 1370 most-superseded. Sub-problem: cluster led by HydraLoRA. Newer alternatives in the same sub-problem include PARAMΔ Integration into Upcycled MoE, MEMIT-like framework for MoE, HELLoRA, SDG-MoE, Marco-MoE.

Method Drift›Mixture-of-experts routing

Superseded baseline#14 of 1,370 most-superseded

MoELoRA

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Mixture-of-experts routing · first seen Feb 20, 2024

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 5 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites MoELoRA as a baseline.

“These methods still exhibit substantial performance degradation on earlier tasks as training progresses”
— SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

Beaten on benchmarks

Head-to-head results where a newer method reports beating MoELoRA. Values are copied from the source paper's tables — verify against the cited paper.

SAME (MOMO) beats MoELoRA · Average [all tasks TriGap]
46.53 vs 44.45
SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME (MOMO) beats MoELoRA · Accuracy [all tasks CoIN]
66.82 vs 50.58
SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
SAME (MOMO) beats MoELoRA · Average [all tasks UCIT]
67.12 vs 52.06
SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
LLaVA-DyMoE (Ours) beats MoELoRA · MFN [main]
57.03 vs 43.93
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MAA [main]
57.70 vs 43.92
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MFN [+ ASD]
60.55 vs 43.93
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LLaVA-DyMoE (Ours) beats MoELoRA · MAA [+ ASD]
62.26 vs 43.92
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
LiMELoRA beats MoELoRA · Commonsense Reasoning [Commonsense Reasoning]
84.98 vs 84.08
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
LLaVA-CMoE beats MoELoRA · Mean [Immediate]
62.81 vs 62.10
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
LLaVA-CMoE beats MoELoRA · Mean [Last]
59.23 vs 44.24
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
PASs-MoE beats MoELoRA · Acc [Math QA]
49.52 vs 42.98
PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning
PASs-MoE beats MoELoRA · Acc [Arts VQA]
43.22 vs 35.89
PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.