CLAILGSep 19, 2024

LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

arXiv:2409.13054v15 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the challenge of updating LLMs without retraining, which is crucial for developers and users dealing with outdated or harmful content, though it is incremental as it builds on existing unlearning and editing methods.

The paper tackles the problem of modifying large language models to unlearn outdated or problematic knowledge and integrate new information efficiently, achieving significant forgetting on unlearn sets and a 20% accuracy increase on update sets while maintaining performance on retain sets.

Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes