LG CLOct 25, 2025

Label Smoothing Improves Gradient Ascent in LLM Unlearning

Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei

arXiv:2510.22376v12 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of degraded model utility in LLM unlearning for AI safety applications, representing an incremental improvement over existing methods.

The paper tackles the instability of Gradient Ascent in LLM unlearning by proposing Smoothed Gradient Ascent, which combines forget data with normal data using a tunable smoothing rate, resulting in improved stability and top-2 performance on key metrics across three benchmarks.

LLM unlearning has emerged as a promising approach, aiming to enable models to forget hazardous/undesired knowledge at low cost while preserving as much model utility as possible. Among existing techniques, the most straightforward method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby forcing the model to unlearn the forget dataset. However, GA suffers from severe instability, as it drives updates in a divergent direction, often resulting in drastically degraded model utility. To address this issue, we propose Smoothed Gradient Ascent (SGA). SGA combines the forget data with multiple constructed normal data through a tunable smoothing rate. Intuitively, this extends GA from learning solely on the forget data to jointly learning across both forget and normal data, enabling more stable unlearning while better preserving model utility. Theoretically, we provide the theoretical guidance on the selection of the optimal smoothing rate. Empirically, we evaluate SGA on three benchmarks: TOFU, Harry Potter, and MUSE-NEWS. Experimental results demonstrate that SGA consistently outperforms the original Gradient Ascent (GA) method across all metrics and achieves top-2 performance among all baseline methods on several key metrics.

View on arXiv PDF

Similar