LGJun 2, 2025

Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning

arXiv:2506.01339v216 citationsh-index: 20ICML
Originality Incremental advance
AI Analysis

This addresses privacy and safety concerns in LLMs by making unlearning more resilient, though it is an incremental improvement over existing methods.

The paper tackles the problem of machine unlearning in large language models being vulnerable to downstream fine-tuning, which can recover forgotten information, by introducing invariance through a regularization-based framework called ILU, achieving superior robustness across diverse fine-tuning scenarios while preserving performance.

Machine unlearning offers a promising solution to privacy and safety concerns in large language models (LLMs) by selectively removing targeted knowledge while preserving utility. However, current methods are highly sensitive to downstream fine-tuning, which can quickly recover forgotten information-even from unrelated tasks. To address this, we introduce invariance into unlearning for the first time, inspired by invariant risk minimization (IRM). Building on this principle, we propose invariant LLM unlearning (ILU), a regularization-based framework that enhances robustness. Notably, ILU generalizes well to diverse fine-tuning tasks, even when trained using a single dataset. A task vector analysis is also provided to further elucidate the rationale behind ILU's effectiveness. Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art unlearning methods, including negative preference optimization (NPO) and representation misdirection for unlearning (RMU). Notably, ILU achieves superior unlearning robustness across diverse downstream fine-tuning scenarios (e.g., math, paraphrase detection, and sentiment analysis) while preserving the fine-tuning performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes