CLAIAug 8, 2024

UNLEARN Efficient Removal of Knowledge in Large Language Models

arXiv:2408.04140v114 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the need to forget private or proprietary information in LLMs, which is important for practical deployment but incremental as it builds on subspace methods.

The paper tackles the problem of efficiently removing specific knowledge from large language models without retraining, proposing UNLEARN which achieves 96% removal of targeted knowledge while maintaining other performance within 2.5% of the original model.

Given the prevalence of large language models (LLMs) and the prohibitive cost of training these models from scratch, dynamically forgetting specific knowledge e.g., private or proprietary, without retraining the model has become an important capability. This paper proposes a novel method to achieve this objective called UNLEARN. The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM. Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model, significantly outperforming the discriminatory abilities of the previous state-of-the-art. A dual method called LEARN is also proposed for targeted knowledge addition. Results show LEARN can match the fine-tuning accuracy of Low-Rank Adaptation (LoRA) without adversely affecting similar tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes