CLLGMay 4

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

arXiv:2605.0322955.8
AI Analysis

For practitioners adapting large language models to new tasks, SMF offers a low-forgetting alternative to LoRA and full finetuning, though the gains are smaller.

Sparse Memory Finetuning (SMF) adds key-value memory layers to a pretrained model and updates only the most relevant memory rows per batch, achieving a 2.5 percentage point gain on MedMCQA while keeping forgetting probes (WikiText perplexity and TriviaQA accuracy) within ~1 point of the base model, unlike LoRA and full finetuning which cause larger drift.

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updating only the small set of memory rows that the current batch reads most heavily. We re-implement SMF on Qwen-2.5-0.5B-Instruct and compare it with LoRA and full finetuning on MedMCQA, a 4-choice medical exam task, using WikiText perplexity and TriviaQA accuracy as forgetting probes. SMF improves MedMCQA by 2.5 percentage points while keeping both forgetting probes within roughly 1 point of the base model, whereas LoRA and full finetuning achieve larger gains but with clear drift on both. We also compare two row-selection rules (KL-divergence and TF-IDF), which balance the two forgetting metrics differently.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes