CLDec 21, 2024

Chained Tuning Leads to Biased Forgetting

arXiv:2412.16469v23 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses safety degradation in deployed LLMs, which is an incremental improvement in understanding and mitigating biased forgetting in continual learning settings.

The paper tackles the problem of catastrophic forgetting in large language models during sequential fine-tuning, showing that safety tuning is forgotten more when applied before downstream tasks and that this forgetting disproportionately affects safety information about certain groups, quantified by a new metric called biased forgetting.

Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning settings to enable training of safer and less toxic models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes