LGAICLNov 21, 2025

Geometric-disentangelment Unlearning

arXiv:2511.17100v43 citationsHas Code
Originality Highly original
AI Analysis

This addresses the practical challenge of safely removing sensitive information from LLMs without degrading their overall performance, representing a novel theoretical approach to an existing bottleneck.

The paper tackles the problem of unlearning private or harmful content from large language models while minimizing collateral damage to retained knowledge, achieving up to 62% improved forgetting and 31% higher retention in experiments.

Large language models (LLMs) can internalize private or harmful content, motivating unlearning that removes a forget set while preserving retaining knowledge. However, forgetting updates often cause collateral degradation on retaining knowledge, creating a persistent trade-off. Existing LLM unlearning methods are often heuristic, and other theoretical approaches rely on offline feature constructions that do not capture update-time forget-retain interaction in LLMs. To address this limitation, we aim to develop an LLM unlearning method that reduces the forget-retain trade-off with theoretical guarantees. We take a first-principles view by formalizing "no side effects" as local retain invariance under small parameter updates, and prove an equivalence under optimizer-induced geometry: the retain loss is locally invariant if and only if the update direction is orthogonal to the subspace spanned by retain gradients. Based on the insight, we propose Geometric-disentanglement Unlearning (GU), a lightweight and theoretically grounded projection that can be plug-and-play to existing gradient-based unlearning methods to mitigate forget-retain side effects. Experiments on TOFU, MUSE, and WMDP-cyber show that GU strengthens forgetting while reducing retain drift. When added to SimNPO, it achieves up to 62\% improved forgetting Extraction Strength (ES) and 31\% higher retain ES. We open-sourced our code in https://github.com/Lemutisme/Geometric-Unlearning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes