Method Drift›Parameter-efficient fine-tuning (LoRA family)
PiSSA
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsParameter-efficient fine-tuning (LoRA family) · first seen Apr 3, 2024
heavily superseded — a standard baseline that newer methods routinely beat
14 papers critique it · 30 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PiSSA as a baseline.
“Despite its effectiveness and popularity, recent studies have underscored that LoRA and its variants face challenges such as diminishing performance~LoRA, and slower convergence~PiSSA relative to full fine-tuning, which deteriorate further as the rank declines~MoRA,HiRA.”
— ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning“SVD-based initialization is computationally expensive and requires a long time”
— NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models“SVD-based initialization is only a soft constraint, allowing updates to drift away from the pretrained subspace during training”
— FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning“However, it still operates under a fixed low-rank constraint, which restricts its capacity to high-complexity tasks.”
— HD-PiSSA: High-Rank Distributed Orthogonal Adaptation“such SFT-oriented spectral priors can create a fundamental geometric mismatch in RLVR, whose optimization dynamics and effective update patterns differ markedly from SFT”
— GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR“they are less pluggable than LoRA, requiring extra computational pipelines and storage for SVD buffers.”
— The Primacy of Magnitude in Low-Rank Adaptation“However, PiSSA operates independently on each layer's weight matrix, missing cross-layer correlations.”
— LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights“Recently, PiSSA meng2024pissa proposes to initializing \\( A \\) and \\( B \\) to approximate the original matrix $W$, by performing SVD on $W$. Our method, however, is based on a very different idea, that is to approximate the gradient of $W$, which involves performing SVD on sampled gradients and properly scaling the initialized matrices, as detailed in Section~compare_pissa.”
— LoRA-GA: Low-Rank Adaptation with Gradient Approximation“OLoRA~buyukakyuz2024olora and PiSSA~meng2024pissa ease optimization by initializing LoRA orthogonally. However, they remove important pre-trained components from the frozen base weights.”
— OP-LoRA: The Blessing of Dimensionality“we enforce the rows of $A$ to be orthonormal by initializing them with right singular vectors of $BA$, which empirically stabilizes training and accelerates optimization compared to a non-orthonormal structure.”
— FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA“These approaches primarily operate in weight space and can improve optimization and generalization under moderate ranks. A common assumption underlying these methods is that task-relevant adaptation directions can be inferred directly from the pretrained weight geometry, without explicit reference to data-induced activation patterns.”
— When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning“However, these works can only solve either side of the two problems, but do not consider the trade-off between enhancing fine-tuning performance and preserving pre-trained knowledge”
— SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Beaten on benchmarks
Head-to-head results where a newer method reports beating PiSSA. Values are copied from the source paper's tables — verify against the cited paper.
- CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Generality domain, Llama-3.1-8B]
58.04 vs 54.72
- CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Law domain, Llama-3.1-8B]
36.25 vs 26.26
- CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Medicine domain, Llama-3.1-8B]
56.11 vs 44.64
- CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Math domain, Llama-3.1-8B]
57.71 vs 47.31
- SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
SVFit beats PiSSA · Avg. [RoBERTa-base]
85.1 vs 84.4
- SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
SVFit beats PiSSA · Avg. [RoBERTa-large]
88.5 vs 88.2
- SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
SVFit beats PiSSA · Avg. [ViT-base]
84.3 vs 84.2
- Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · avg [Model Avg. / Math]
46.1 vs 44.7
- Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · avg [Model Avg. / Code]
55.9 vs 52.4
- Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · avg [Model Avg. / Task Avg.]
51.0 vs 48.5
- Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · min [Model Avg. / Task Avg.]
52.2 vs 50.8
- Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · max [Model Avg. / Task Avg.]
53.9 vs 53.0
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- May 28, 2026
- May 19, 2026
- May 15, 2026
- May 12, 2026
- May 11, 2026
- May 11, 2026
- May 8, 2026
- May 5, 2026
- May 5, 2026
- May 5, 2026
- RDP LoRARDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language ModelsApr 21, 2026