Method Drift›Parameter-efficient fine-tuning (LoRA family)
BitFit
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-modelsParameter-efficient fine-tuning (LoRA family) · first seen Jun 18, 2021
heavily superseded — a standard baseline that newer methods routinely beat
7 papers critique it · 28 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites BitFit as a baseline.
“Yet, because weight matrices are not changed %the bias terms remain constant across inputs % %(see Section~sec: method), %variation between samples are not taken into account during fine-tuning. As a result, % they may not reach the performance achieved by other methods, such as Low-Rank Adaptation (LoRA) hu2021lora, pu2023empirical.”
— 1LoRA: Summation Compression for Very Low-Rank Adaptation“Unlike the previous literature like Liu2022FewShotPF or BenZaken2021BitFitSP, we introduce a novel prompt-aware mechanism to the PEFT method.”
— PARA: Parameter-Efficient Fine-tuning with Prompt Aware Representation Adjustment“This includes methods previously considered less effective, such as BitFit~zaken2022bitfit, which FT only the bias terms of the frozen backbone.”
— Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition“Research based on sparse methods zaken2021bitfit optimizes the bias parameter to reduce cost and improve the performance of the model during specific tasks but faces problems in dealing with real data.”
— Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models“BitFit is susceptible to catastrophic forgetting when the target task has a large zero-shot silhouette score”
— Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP“We hypothesize, however, that the bias may not necessarily be the optimal component of BERT for parameter-efficient fine-tuning, and similar/better performance could be obtained by training a smaller number of parameters if the optimal component is chosen.”
— LayerNorm: A key component in parameter-efficient fine-tuning“BitFit method zaken2021bitfit updates only the bias parameters, resulting in a substantial reduction in the number of trainable parameters, but at the cost of suboptimal performance.”
— PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Beaten on benchmarks
Head-to-head results where a newer method reports beating BitFit. Values are copied from the source paper's tables — verify against the cited paper.
- Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks
Heart-LoRA beats BitFit · Avg. Acc. [VTAB-1K]
77.2 vs 65.2
- State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
State-offset Tuning beats BitFit · Spider (All) [Mamba 1.4B]
57.4 vs 51.25725269
- State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
State-offset Tuning beats BitFit · GLUE (Avg.) [Mamba 1.4B]
78.5 vs 77.9
- Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
DEFLECT beats BitFit · Avg. Perf. [Scale-MAE]
64.6 vs 56.5
- Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
DEFLECT beats BitFit · Avg. Perf. [DINO-MC]
64.8 vs 63.3
- Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
DEFLECT beats BitFit · Avg. Perf. [Cross-Scale MAE]
59.9 vs 56.6
- Decentralized Low-Rank Fine-Tuning of Large Language Models
Dec-LoRA beats BitFit · Average accuracy [K=1]
89.61 vs 86.42
- Decentralized Low-Rank Fine-Tuning of Large Language Models
Dec-LoRA beats BitFit · Average accuracy [K=5]
89.39 vs 86.86
- CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Peak [Symmetric 20%]
51.9 vs 51.7
- CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Avg [Symmetric 20%]
51.1 vs 51.0
- CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Peak [Symmetric 40%]
51.6 vs 50.8
- CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Avg [Symmetric 40%]
51.2 vs 48.1
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- May 28, 2026
- May 19, 2026
- May 15, 2026
- May 12, 2026
- May 11, 2026
- May 11, 2026
- May 8, 2026
- May 5, 2026
- May 5, 2026
- May 5, 2026
- RDP LoRARDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language ModelsApr 21, 2026