CLApr 20

Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning

arXiv:2601.0319095.08 citationsh-index: 13Has Code
Predicted impact top 16% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For LLM practitioners needing to forget sensitive knowledge, PALU reduces utility degradation by localizing unlearning to critical tokens and logits.

PALU proposes a prefix-aware localized unlearning method for LLMs that focuses on suppressing sensitive prefixes and flattening top-k logits, achieving superior forgetting efficacy and utility preservation compared to baselines.

Machine unlearning aims to forget sensitive knowledge from Large Language Models (LLMs) while maintaining general utility. However, existing approaches typically treat all tokens in a response indiscriminately and enforce uncertainty over the entire vocabulary. This global treatment results in unnecessary utility degradation and extends optimization to content-agnostic regions. To address these limitations, we propose PALU (Prefix-Aware Localized Unlearning), a framework driven by a local entropy maximization objective across both temporal and vocabulary dimensions. PALU reveals that (i) suppressing the sensitive prefix alone is sufficient to sever the causal generation link, and (ii) flattening only the top-$k$ logits is adequate to maximize uncertainty in the critical subspace. These findings allow PALU to alleviate redundant optimization across the full vocabulary and parameter space while minimizing collateral damage to general model performance. Extensive experiments validate that PALU achieves superior forgetting efficacy and utility preservation compared to state-of-the-art baselines. Our code is available at https://github.com/nxZhai/PALU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes