Method Drift›Mixture-of-experts routing
HydraLoRA
HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-TuningMixture-of-experts routing · first seen Apr 30, 2024
heavily superseded — a standard baseline that newer methods routinely beat
1 papers critique it · 8 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites HydraLoRA as a baseline.
“Compared to existing state-of-the-art MoE baselines (Switch Transformer, MoLE, HydraLoRA), HiLoMoE consistently shows superior efficiency and effectiveness. On average, it improves AUC by 0.08% and reduces LogLoss by 0.10% compared to the best competing MoE variant (HydraLoRA). At the same time, HiLoMoE reduces parameter count by an average of 4.04K, which is equivalent to a 21.0% reduction relative to the most parameter-efficient MoE competitor (HydraLoRA).”
— Hierarchical LoRA MoE for Efficient CTR Model Scaling
Beaten on benchmarks
Head-to-head results where a newer method reports beating HydraLoRA. Values are copied from the source paper's tables — verify against the cited paper.
- Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · AUC [DIEN + KuaiVideo]
0.7446 vs 0.7436
- Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · LogLoss [DIEN + KuaiVideo]
0.4341 vs 0.4374
- Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
GOAT beats HydraLoRA · Average [Image Classification (IC)]
81.49 vs 79.58
- Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
GOAT beats HydraLoRA · GSM8K [Natural Language Generation (NLG)]
60.20 vs 57.39
- Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
GOAT beats HydraLoRA · Average [Natural Language Understanding (NLU)]
89.76 vs 88.56
- MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
MoORE (L=8) beats HydraLoRA · Overall [CSR-MTL multi-task adaptation]
85.11 vs 83.84
- Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · AUC [BST + TaobaoAd]
0.6505 vs 0.6484
- Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · LogLoss [BST + TaobaoAd]
0.1932 vs 0.1938
- LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
LiMEDoRA beats HydraLoRA · Vision Benchmark [Vision Benchmark]
78.12 vs 78.11
- FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
FourierMoE beats HydraLoRA · Average [Gemma 7B]
88.19 vs 87.01
- FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
FourierMoE beats HydraLoRA · AVG. [LLaMA-3 8B]
73.24 vs 69.63
- Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
SMoRA beats HydraLoRA · AVERAGE [Llama2-7b]
58.8 vs 54.7
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- PARAMΔ Integration into Upcycled MoEA Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoEMay 18, 2026
- MEMIT-like framework for MoEScalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured UpdatesMay 15, 2026
- May 11, 2026
- May 8, 2026
- Apr 28, 2026
- CoGR-MoECoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question AnsweringApr 18, 2026
- Apr 2, 2026
- On Token's DilemmaOn Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language ModelsMar 29, 2026
- Mixture-of-Experts (MoE) and Mixture-of-Linear-Experts (MoLE) architectures for MLIPsScaling Machine Learning Interatomic Potentials with Mixtures of ExpertsMar 9, 2026
- Mar 5, 2026
- Feb 13, 2026
- Multiscale Interaction Mixture of Experts (MI-MoE)Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property PredictionJan 19, 2026