Is HydraLoRA superseded?

HydraLoRA (Mixture-of-experts routing): heavily superseded — a standard baseline that newer methods routinely beat. 1 paper(s) critique it, 8 beat it on benchmarks — #4 of 1370 most-superseded. Sub-problem: cluster led by HydraLoRA. Newer alternatives in the same sub-problem include PARAMΔ Integration into Upcycled MoE, MEMIT-like framework for MoE, HELLoRA, SDG-MoE, Marco-MoE.

Method Drift›Mixture-of-experts routing

Heavily superseded#4 of 1,370 most-superseded

HydraLoRA

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Mixture-of-experts routing · first seen Apr 30, 2024

heavily superseded — a standard baseline that newer methods routinely beat

1 papers critique it · 8 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites HydraLoRA as a baseline.

“Compared to existing state-of-the-art MoE baselines (Switch Transformer, MoLE, HydraLoRA), HiLoMoE consistently shows superior efficiency and effectiveness. On average, it improves AUC by 0.08% and reduces LogLoss by 0.10% compared to the best competing MoE variant (HydraLoRA). At the same time, HiLoMoE reduces parameter count by an average of 4.04K, which is equivalent to a 21.0% reduction relative to the most parameter-efficient MoE competitor (HydraLoRA).”
— Hierarchical LoRA MoE for Efficient CTR Model Scaling

Beaten on benchmarks

Head-to-head results where a newer method reports beating HydraLoRA. Values are copied from the source paper's tables — verify against the cited paper.

HiLoMoE beats HydraLoRA · AUC [DIEN + KuaiVideo]
0.7446 vs 0.7436
Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · LogLoss [DIEN + KuaiVideo]
0.4341 vs 0.4374
Hierarchical LoRA MoE for Efficient CTR Model Scaling
GOAT beats HydraLoRA · Average [Image Classification (IC)]
81.49 vs 79.58
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
GOAT beats HydraLoRA · GSM8K [Natural Language Generation (NLG)]
60.20 vs 57.39
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
GOAT beats HydraLoRA · Average [Natural Language Understanding (NLU)]
89.76 vs 88.56
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
MoORE (L=8) beats HydraLoRA · Overall [CSR-MTL multi-task adaptation]
85.11 vs 83.84
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
HiLoMoE beats HydraLoRA · AUC [BST + TaobaoAd]
0.6505 vs 0.6484
Hierarchical LoRA MoE for Efficient CTR Model Scaling
HiLoMoE beats HydraLoRA · LogLoss [BST + TaobaoAd]
0.1932 vs 0.1938
Hierarchical LoRA MoE for Efficient CTR Model Scaling
LiMEDoRA beats HydraLoRA · Vision Benchmark [Vision Benchmark]
78.12 vs 78.11
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
FourierMoE beats HydraLoRA · Average [Gemma 7B]
88.19 vs 87.01
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
FourierMoE beats HydraLoRA · AVG. [LLaMA-3 8B]
73.24 vs 69.63
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
SMoRA beats HydraLoRA · AVERAGE [Llama2-7b]
58.8 vs 54.7
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.