PiSSA (Parameter-efficient fine-tuning (LoRA family)): heavily superseded — a standard baseline that newer methods routinely beat. 14 paper(s) critique it, 30 beat it on benchmarks — #4 of 1113 most-superseded. Sub-problem: cluster led by LoRA. Newer alternatives in the same sub-problem include Balanced LoRA, FedSmoothLoRA, FuRA, LoRA-Over, Hybrid-LoRA.

Method Drift›Parameter-efficient fine-tuning (LoRA family)

Heavily superseded#4 of 1,113 most-superseded

PiSSA

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

Parameter-efficient fine-tuning (LoRA family) · first seen Apr 3, 2024

heavily superseded — a standard baseline that newer methods routinely beat

14 papers critique it · 30 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PiSSA as a baseline.

“Despite its effectiveness and popularity, recent studies have underscored that LoRA and its variants face challenges such as diminishing performance~LoRA, and slower convergence~PiSSA relative to full fine-tuning, which deteriorate further as the rank declines~MoRA,HiRA.”
— ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
“SVD-based initialization is computationally expensive and requires a long time”
— NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models
“SVD-based initialization is only a soft constraint, allowing updates to drift away from the pretrained subspace during training”
— FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning
“However, it still operates under a fixed low-rank constraint, which restricts its capacity to high-complexity tasks.”
— HD-PiSSA: High-Rank Distributed Orthogonal Adaptation
“such SFT-oriented spectral priors can create a fundamental geometric mismatch in RLVR, whose optimization dynamics and effective update patterns differ markedly from SFT”
— GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
“they are less pluggable than LoRA, requiring extra computational pipelines and storage for SVD buffers.”
— The Primacy of Magnitude in Low-Rank Adaptation
“However, PiSSA operates independently on each layer's weight matrix, missing cross-layer correlations.”
— LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights
“Recently, PiSSA meng2024pissa proposes to initializing \$ A \$ and \$ B \$ to approximate the original matrix $W$, by performing SVD on $W$. Our method, however, is based on a very different idea, that is to approximate the gradient of $W$, which involves performing SVD on sampled gradients and properly scaling the initialized matrices, as detailed in Section~compare_pissa.”
— LoRA-GA: Low-Rank Adaptation with Gradient Approximation
“OLoRA~buyukakyuz2024olora and PiSSA~meng2024pissa ease optimization by initializing LoRA orthogonally. However, they remove important pre-trained components from the frozen base weights.”
— OP-LoRA: The Blessing of Dimensionality
“we enforce the rows of $A$ to be orthonormal by initializing them with right singular vectors of $BA$, which empirically stabilizes training and accelerates optimization compared to a non-orthonormal structure.”
— FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA
“These approaches primarily operate in weight space and can improve optimization and generalization under moderate ranks. A common assumption underlying these methods is that task-relevant adaptation directions can be inferred directly from the pretrained weight geometry, without explicit reference to data-induced activation patterns.”
— When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning
“However, these works can only solve either side of the two problems, but do not consider the trade-off between enhancing fine-tuning performance and preserving pre-trained knowledge”
— SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Beaten on benchmarks

Head-to-head results where a newer method reports beating PiSSA. Values are copied from the source paper's tables — verify against the cited paper.

CoLA beats PiSSA · Accuracy [Generality domain, Llama-3.1-8B]
58.04 vs 54.72
CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Law domain, Llama-3.1-8B]
36.25 vs 26.26
CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Medicine domain, Llama-3.1-8B]
56.11 vs 44.64
CoLA: Collaborative Low-Rank Adaptation
CoLA beats PiSSA · Accuracy [Math domain, Llama-3.1-8B]
57.71 vs 47.31
CoLA: Collaborative Low-Rank Adaptation
SVFit beats PiSSA · Avg. [RoBERTa-base]
85.1 vs 84.4
SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
SVFit beats PiSSA · Avg. [RoBERTa-large]
88.5 vs 88.2
SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
SVFit beats PiSSA · Avg. [ViT-base]
84.3 vs 84.2
SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
NB beats PiSSA · avg [Model Avg. / Math]
46.1 vs 44.7
Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · avg [Model Avg. / Code]
55.9 vs 52.4
Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · avg [Model Avg. / Task Avg.]
51.0 vs 48.5
Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · min [Model Avg. / Task Avg.]
52.2 vs 50.8
Norm-Bounded Low-Rank Adaptation
NB beats PiSSA · max [Model Avg. / Task Avg.]
53.9 vs 53.0
Norm-Bounded Low-Rank Adaptation

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.