Is BitFit superseded?

BitFit (Parameter-efficient fine-tuning (LoRA family)): heavily superseded — a standard baseline that newer methods routinely beat. 7 paper(s) critique it, 28 beat it on benchmarks — #5 of 1113 most-superseded. Sub-problem: cluster led by LoRA. Newer alternatives in the same sub-problem include Balanced LoRA, FedSmoothLoRA, FuRA, LoRA-Over, Hybrid-LoRA.

Method Drift›Parameter-efficient fine-tuning (LoRA family)

Heavily superseded#5 of 1,113 most-superseded

BitFit

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Parameter-efficient fine-tuning (LoRA family) · first seen Jun 18, 2021

heavily superseded — a standard baseline that newer methods routinely beat

7 papers critique it · 28 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites BitFit as a baseline.

“Yet, because weight matrices are not changed %the bias terms remain constant across inputs % %(see Section~sec: method), %variation between samples are not taken into account during fine-tuning. As a result, % they may not reach the performance achieved by other methods, such as Low-Rank Adaptation (LoRA) hu2021lora, pu2023empirical.”
— 1LoRA: Summation Compression for Very Low-Rank Adaptation
“Unlike the previous literature like Liu2022FewShotPF or BenZaken2021BitFitSP, we introduce a novel prompt-aware mechanism to the PEFT method.”
— PARA: Parameter-Efficient Fine-tuning with Prompt Aware Representation Adjustment
“This includes methods previously considered less effective, such as BitFit~zaken2022bitfit, which FT only the bias terms of the frozen backbone.”
— Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
“Research based on sparse methods zaken2021bitfit optimizes the bias parameter to reduce cost and improve the performance of the model during specific tasks but faces problems in dealing with real data.”
— Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
“BitFit is susceptible to catastrophic forgetting when the target task has a large zero-shot silhouette score”
— Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
“We hypothesize, however, that the bias may not necessarily be the optimal component of BERT for parameter-efficient fine-tuning, and similar/better performance could be obtained by training a smaller number of parameters if the optimal component is chosen.”
— LayerNorm: A key component in parameter-efficient fine-tuning
“BitFit method zaken2021bitfit updates only the bias parameters, resulting in a substantial reduction in the number of trainable parameters, but at the cost of suboptimal performance.”
— PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Beaten on benchmarks

Head-to-head results where a newer method reports beating BitFit. Values are copied from the source paper's tables — verify against the cited paper.

Heart-LoRA beats BitFit · Avg. Acc. [VTAB-1K]
77.2 vs 65.2
Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks
State-offset Tuning beats BitFit · Spider (All) [Mamba 1.4B]
57.4 vs 51.25725269
State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
State-offset Tuning beats BitFit · GLUE (Avg.) [Mamba 1.4B]
78.5 vs 77.9
State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
DEFLECT beats BitFit · Avg. Perf. [Scale-MAE]
64.6 vs 56.5
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
DEFLECT beats BitFit · Avg. Perf. [DINO-MC]
64.8 vs 63.3
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
DEFLECT beats BitFit · Avg. Perf. [Cross-Scale MAE]
59.9 vs 56.6
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Dec-LoRA beats BitFit · Average accuracy [K=1]
89.61 vs 86.42
Decentralized Low-Rank Fine-Tuning of Large Language Models
Dec-LoRA beats BitFit · Average accuracy [K=5]
89.39 vs 86.86
Decentralized Low-Rank Fine-Tuning of Large Language Models
CleaR_{\text{BitFit}} beats BitFit · Peak [Symmetric 20%]
51.9 vs 51.7
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Avg [Symmetric 20%]
51.1 vs 51.0
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Peak [Symmetric 40%]
51.6 vs 50.8
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning
CleaR_{\text{BitFit}} beats BitFit · Avg [Symmetric 40%]
51.2 vs 48.1
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.