ASLGFeb 19, 2024

Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

arXiv:2402.12220v314 citationsh-index: 3IEEE/ACM Transactions on Audio Speech and Language Processing
Originality Incremental advance
AI Analysis

This work addresses the problem of catastrophic forgetting for researchers and practitioners using PEFT in domains like language modeling and speech synthesis, representing an incremental improvement by adapting existing Bayesian methods to PEFT.

The paper tackles catastrophic forgetting in parameter-efficient fine-tuning (PEFT) by applying Bayesian learning techniques, such as Laplace approximations, to regularize methods like LoRA, and shows that this approach overcomes forgetting without degrading fine-tuning performance, with Kronecker-factored approximations outperforming diagonal ones in preserving pre-training knowledge.

We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes