CLCVMay 28

Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting

arXiv:2605.2949887.9h-index: 9
Predicted impact top 41% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners fine-tuning LLMs with LoRA, this plug-and-play regularizer mitigates catastrophic forgetting without requiring replay data or architectural changes.

LoRA fine-tuning can cause forgetting of prior capabilities when the adaptation distribution differs from the original training distribution. The proposed output-space regularizer, which removes the ground-truth token and applies KL divergence only on non-target vocabulary, improves the trade-off between new learning and forgetting across various LoRA variants and backbones.

Low-Rank Adaptation (LoRA) has become one of the most widely used fine-tuning mechanisms for adapting large language models to new domains, tasks, and users. Yet adaptation performance alone can obscure an important failure mode: LoRA updates may improve performance on the target distribution while degrading prior capabilities learned during pretraining and alignment. We show that this forgetting becomes especially severe when the adaptation distribution differs substantially from the models original training or alignment distributions. The challenge is amplified in practical settings, where the original training and alignment data are typically unavailable. Motivated by this constraint, we study how LoRA based adaptation balances new learning against forgetting in a replay-free setting, and introduce a simple output space regularizer that can be added directly to existing training pipelines. Our method removes the ground-truth token from both the base and adapted model distributions, renormalizes the remaining probabilities, and applies KL regularization only over the non-target vocabulary. This preserves the base models relative preferences among alternative tokens without directly opposing the cross-entropy signal required for adaptation. As the regularizer acts only at the loss level, it requires no replay data, architectural changes, adapter redesign, or inference-time overhead, and can be applied directly to existing LoRA variants. Across all LoRA variants tested and across various backbones, our method improves the frontier between new learning and forgetting when the adaptation distribution differs substantially from the base models original training or alignment distributions, suggesting a broadly applicable route toward more reliable LLM updating.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes