DoRA (Parameter-efficient fine-tuning (LoRA family)): heavily superseded — a standard baseline that newer methods routinely beat. 16 paper(s) critique it, 66 beat it on benchmarks — #3 of 1113 most-superseded. Sub-problem: cluster led by LoRA. Newer alternatives in the same sub-problem include Balanced LoRA, FedSmoothLoRA, FuRA, LoRA-Over, Hybrid-LoRA.

Method Drift›Parameter-efficient fine-tuning (LoRA family)

Heavily superseded#3 of 1,113 most-superseded

DoRA

DoRA: Weight-Decomposed Low-Rank Adaptation

Parameter-efficient fine-tuning (LoRA family) · first seen Feb 14, 2024

heavily superseded — a standard baseline that newer methods routinely beat

16 papers critique it · 66 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites DoRA as a baseline.

“DoRA decomposes the model weights into their directional and magnitude components and fine-tunes both, but only the former remains low-rank.”
— LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
“This incurs (d_in^2) memory for the identity matrix alone: 32 MB at d_in=4096, 128 MB at d_in=8192 in bf16. Including the dense BA product and composed-weight copy, a single module allocates 3–4 dense [d_out, d_in] temporaries: 512 MB at d_in=8192.”
— Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels
“DoRA implicitly assumes that the direction of a matrix can be decomposed into per-column units, an assumption that lacks a clear theoretical grounding in matrix analysis.”
— MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
“DoRA relies on strict normalization, which makes it sensitive to optimization instabilities: when the adapted weight norm approaches zero, gradients can explode, destabilizing training.”
— DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
“Nonetheless, DoRA introduces additional parameters and over-expressive architecture compared to LoRA, which can exacerbate overfitting issues when adapting to small downstream datasets (See tab:gap).”
— BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
“These approaches primarily operate in weight space and can improve optimization and generalization under moderate ranks. A common assumption underlying these methods is that task-relevant adaptation directions can be inferred directly from the pretrained weight geometry, without explicit reference to data-induced activation patterns.”
— When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning
“decomposes pretrained weights into magnitude and direction components, utilizing LoRA for directional updates, reducing trainable parameters and enhancing fine-tuning performance, though its complexity and dependence on data quality may limit its effectiveness.”
— SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model
“Although DoRA improves LoRA's learning capacity, its parameter count scales with the model's dimensionality since the magnitude component in DoRA is an n-dimensional trainable vector, where n represents the number of columns of the weight matrix.”
— EDoRA: Efficient Weight-Decomposed Low-Rank Adaptation via Singular Value Decomposition
“This directly addresses a symptom of the scale ambiguity we identified, but it provides a heuristic fix without altering the core $BA^$ parameterization that creates the ambiguity.”
— OrthoGeoLoRA: Geometric Parameter-Efficient Fine-Tuning for Structured Social Science Concept Retrieval on theWeb
“DoRA liu2024dora and LoRA+ hayou2024lora, address limitations in LoRA's training dynamics”
— The Quest for Winning Tickets in Low-Rank Adapters
“Learning parameter-based adaptation methods may struggle to generalize to out-of-distribution tasks, particularly when the injection of additional parameters is suboptimally placed, potentially leading to degraded performance”
— Surgical AI Copilot: Energy-Based Fourier Gradient Low-Rank Adaptation for Surgical LLM Agent Reasoning and Planning
“SoRA and DoRA both incur additional training overhead in the form of architectural modifications, importance calculations, additional regularization terms, or bespoke optimization strategies.”
— Post-Optimization Adaptive Rank Allocation for LoRA

Beaten on benchmarks

Head-to-head results where a newer method reports beating DoRA. Values are copied from the source paper's tables — verify against the cited paper.

LoFT beats DoRA · average accuracy [LLaMA-7B, r=16]
76.08 vs 71.11
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
LoFT beats DoRA · average accuracy [LLaMA2-7B, r=16]
80.46 vs 79.71
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
LoFT beats DoRA · average accuracy [LLaMA3-8B, r=16]
85.63 vs 84.96
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
LoFT beats DoRA · average accuracy [ViT-Base, r=16]
76.12 vs 74.74
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
FLoRA beats DoRA · Avg [12.77M-param budget]
52.5 vs 45.8
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
FLoRA beats DoRA · Avg [25.65M-param budget]
53.7 vs 46.9
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
FLoRA beats DoRA · Avg [40.49M-param budget]
54.7 vs 45.0
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
FLoRA beats DoRA · All [0.33M-param budget]
89.21 vs 88.31
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
FLoRA beats DoRA · All [1.33M-param budget]
89.80 vs 88.49
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
FLoRA beats DoRA · Avg [4.63%-param budget]
67.8 vs 67.6
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
IGLoRA beats DoRA · Avg [RoBERTa-large, GLUE benchmark]
89.42 vs 88.75
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
RoSA beats DoRA · micro-avg(%) [Qwen2.5-7B]
85.9 vs 84.9
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.