Is AdaLoRA superseded?

AdaLoRA (Parameter-efficient fine-tuning (LoRA family)): heavily superseded — a standard baseline that newer methods routinely beat. 29 paper(s) critique it, 54 beat it on benchmarks — #2 of 1113 most-superseded. Sub-problem: cluster led by LoRA. Newer alternatives in the same sub-problem include Balanced LoRA, FedSmoothLoRA, FuRA, LoRA-Over, Hybrid-LoRA.

Method Drift›Parameter-efficient fine-tuning (LoRA family)

Heavily superseded#2 of 1,113 most-superseded

AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Parameter-efficient fine-tuning (LoRA family) · first seen Mar 18, 2023

heavily superseded — a standard baseline that newer methods routinely beat

29 papers critique it · 54 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites AdaLoRA as a baseline.

“AdaLoRA uses a single dataset to simultaneously learn rank-1 matrices and their importance scores, which can easily lead to overfitting.”
— AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
“leaves pretraining and finetuning completely intact. There is no need for additional objectives or regularizers, which can slow down convergence and affect the training optimum reached.”
— AdaRank: Disagreement Based Module Rank Prediction for Low-rank Adaptation
“However, the scoring mechanism in AdaLoRA is primarily based on instantaneous gradient signals, which fail to capture long-term parameter contributions and inter-layer interactions.”
— IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
“Methods such as AdaLoRA Zhang2023AdaptiveBA rely on sensitivity-based importance scores that are unreliable as they only consider how a single parameter change affects the model under the assumption that no other parameters change.”
— FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
“LoRA and AdaLoRA still clearly overfit the training data as fine-tuning advances, with decreases in training losses but increases in test losses.”
— Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach
“However, it is worth noting that AdaLoRA entails additional computational overhead and necessitates a higher initial budget of trainable parameters, making it less suitable for low-resource scenarios.”
— Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning
“AdaLoRA zhang2023adalora redistributes per-layer rank dynamically but recomputes SVD-based importance at every step and introduces multiple schedule hyperparameters.”
— FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning
“As fine-tuning progresses, the disparity between training and testing losses in both LoRA and AdaLoRA becomes more pronounced. Beyond a certain number of iterations, we observe an increase in test losses alongside a continued decrease in training losses, clearly indicating a tendency for LoRA and AdaLoRA to overfit to the training data.”
— BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models
“they do not couple rank allocation to changes in dense computation, and therefore do not directly optimize for inference efficiency.”
— Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference
“Although these approaches reduce redundancy and improve efficiency, they remain constrained by fixed ranks and show limited flexibility and generalization across datasets and architectures.”
— Regularizing Subspace Redundancy of Low-Rank Adaptation
“introduces orthogonal regularization to ensure that the low-rank projection matrices comply with Singular Value Decomposition (SVD), thus avoiding the reliance on incremental updates, albeit at the cost of increased computational complexity.”
— SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model
“This is a form of structured pruning that tackles the symptom of wasted capacity caused by rank collapse. In contrast, our approach aims to prevent rank collapse by construction.”
— OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb

Beaten on benchmarks

Head-to-head results where a newer method reports beating AdaLoRA. Values are copied from the source paper's tables — verify against the cited paper.

NoRA beats AdaLoRA · Average Accuracy [Fine-tuning on LLaMA-3 8B]
83.1 vs 81.4
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
NoRA beats AdaLoRA · Average Accuracy [Shots 4]
81.8 vs 81.5
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
NoRA beats AdaLoRA · Average Accuracy [Shots 16]
85.4 vs 84.1
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
Sensitivity-LoRA (ours) beats AdaLoRA · Avg. [GLUE benchmark, RoBERTa-base]
85.94 vs 85.20
Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models
Sensitivity-LoRA (ours) beats AdaLoRA · Avg. [Qwen2.5-7B]
37.98 vs 37.39
Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models
Sensitivity-LoRA (ours) beats AdaLoRA · Avg. [LLaMA3.1-8B]
49.57 vs 48.80
Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models
AutoLoRA beats AdaLoRA · Avg [GLUE benchmark, RoBERTa-base]
85.5 vs 85.0
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
AutoLoRA beats AdaLoRA · BLEU [NLG, GPT-medium on E2E+WebNLG]
67.9 vs 67.0
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
AutoLoRA beats AdaLoRA · F1 [BioNLP, RoBERTa-base]
74.2 vs 73.0
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
FlexLoRA beats AdaLoRA · Average Score [DeBERTaV3-base]
89.1 vs 88.1
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
FlexLoRA beats AdaLoRA · Average Accuracy [ViT-B/16, VTAB]
67.8 vs 64.7
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
IGLoRA beats AdaLoRA · Avg [RoBERTa-large, GLUE benchmark]
89.42 vs 88.68
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.