Method Drift›Parameter-efficient fine-tuning (LoRA family)
MoELoRA
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language ModelsParameter-efficient fine-tuning (LoRA family) · first seen Feb 20, 2024
superseded — cited as a baseline and beaten by newer methods
11 papers critique it · 10 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites MoELoRA as a baseline.
“Compared to the base model, the three multi-task LoRA method (MOELoRA, MTL-LoRA, and HydraLoRA) are effective, but fail to effectively learn the instruction patterns in the pre-trained weights due to the random initialization of their experts”
— CoLA: Collaborative Low-Rank Adaptation“However, our experimental results indicate that these models are less effective in multi-modal fusion.”
— VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition“Approaches like MultiLoRA and MoELoRA improve LoRA's multi-task performance in joint training scenarios by integrating multiple LoRAs or utilizing expert routing. However, they fail to strike a good balance between task-specific information and task-information sharing, resulting in suboptimal performance.”
— MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning“many MoE-LoRA variants operate at relatively coarse granularity (e.g., selecting experts at layer/module level), and routing can suffer from imbalance or collapse without careful regularization”
— Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition“the reliance on routers prevents the adapted parameters from being merged back into the base model, leading to considerable inference overhead and extra storage requirements, thereby hindering real-world deployment”
— ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation“Compared with LoRA Fine-Tune, MoELoRA has superior anti-forgetting performance due to the multi-experts mixture mechanism, while it fails in some tasks.”
— Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework“they implicitly assume that experts operate independently. In practice, this independence amplifies routing noise, induces sharp and low-entropy gating distributions, and causes the routing mass to concentrate on a small subset of experts”
— TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models“However, while these methods effectively mitigate interference, they allocate separate LoRA modules per expert, leading to a multiplicative increase in parameter count as the number of experts grows.”
— Less is More: Resource-Efficient Low-Rank Adaptation“LoRA-MoE lacks fine-grained rank control due to its expert-level gating”
— Adaptive Capacity Allocation for Vision Language Action Fine-tuning“Although the MoELoRA moelora and HydraLoRA hydralora architectures use different weights for different tokens, they do not adequately address the limitations of shared input-output projections.”
— Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation“the allocation of expert numbers in the LoRA-MoE architecture still relies on manual settings, potentially leading to significant parameter redundancy and overfitting issues, thereby weakening the model's generalization capability and downstream task performance.”
— A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning
Beaten on benchmarks
Head-to-head results where a newer method reports beating MoELoRA. Values are copied from the source paper's tables — verify against the cited paper.
- Multiple Choice Learning of Low Rank Adapters for Language Modeling
LoRA-MCL (annealed) beats MoELoRA · SPIDEr [BS (Beam Search) decoding]
0.415 vs 0.405
- Multiple Choice Learning of Low Rank Adapters for Language Modeling
LoRA-MCL beats MoELoRA · Div2 [DBS (Diverse Beam Search) with lambda=0.8, Beam=3]
0.666 vs 0.654
- FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
PI-LoRA beats MoELoRA · F1 (Triplet extract) [Qwen 2.5 7B, Text2MDT]
0.913 vs 0.908
- FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
PI-LoRA beats MoELoRA · Tree_Acc [Qwen 2.5 7B, Text2MDT]
0.772 vs 0.764
- FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
PI-LoRA beats MoELoRA · Tree_Acc [Qwen 2.5 7B, Text2MDT end-to-end]
0.550 vs 0.520
- FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
PI-LoRA beats MoELoRA · DP_F1 [Qwen 2.5 7B, Text2MDT end-to-end]
0.679 vs 0.657
- MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
MTL-LoRA beats MoELoRA · Avg. [commonsense reasoning tasks]
82.1 vs 78.3
- ACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editing
ACE-LoRA beats MoELoRA · Overall Score [Avg. metric]
8.8639 vs 7.8801
- Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework
MAGE (Mixture and Aggregation of General LoRA and Expert LoRA) beats MoELoRA · Avg.ACC [non-model-expansion]
49.58 vs 43.90
- Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework
MAGE (Mixture and Aggregation of General LoRA and Expert LoRA) beats MoELoRA · Forgetting [non-model-expansion]
12.26 vs 19.08
- TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
TalkLoRA beats MoELoRA · Avg [LLaMA2-7B, r=32]
82.9 vs 78.3
- TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
TalkLoRA beats MoELoRA · Avg [LLaMA3-8B, r=16]
87.4 vs 86.6
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- May 28, 2026
- May 19, 2026
- May 15, 2026
- May 12, 2026
- May 11, 2026
- May 11, 2026
- May 8, 2026
- May 5, 2026
- May 5, 2026
- May 5, 2026
- RDP LoRARDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language ModelsApr 21, 2026