CLJan 6, 2024

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

arXiv:2401.03129v118 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses a key problem for developers of large language models who need to update models without losing previously learned capabilities, but it is incremental as it focuses on characterizing rather than solving forgetting.

The paper investigates catastrophic forgetting during continual pre-training of fine-tuned large language models, finding that it poses a non-trivial challenge, particularly with repetition issues, as shown through evaluations across output format, knowledge, and reliability dimensions.

Recent advances in Large Language Models (LLMs) have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes