LGMay 7

Retain-Neutral Surrogates for Min-Max Unlearning

Junhao Cai, Dohun Kim, Dowon Kim, Sung Il Choi, Chengjun Jin, Juhyun Park, Changhee Joo

arXiv:2605.0587176.2h-index: 14

AI Analysis

For practitioners of machine unlearning, this work provides a principled method to mitigate retain loss in high-gradient-coupling regimes.

The paper tackles the problem of machine unlearning, specifically the issue of retain loss increase when forget and retain gradients are aligned. It proposes ROSU, a method that constrains surrogate construction to avoid retain damage, achieving improved performance on benchmarks like CIFAR-10/100, Tiny-ImageNet, TOFU, and WMDP.

Machine unlearning seeks to remove the influence of designated training data while preserving performance on the remaining data. Approximate unlearning can be viewed as a local editing problem; in min-max unlearning, the key local object is the surrogate point at which the retain objective is evaluated. When forget and retain gradients are strongly aligned, an unconstrained forget-maximizing perturbation can move to a surrogate point that increases retain loss. We propose Retain-Orthogonal Surrogate Unlearning (ROSU), which constrains the inner surrogate construction by maximizing first-order forget gain subject to zero first-order retain change under a fixed perturbation budget. This yields a closed-form retain-orthogonal perturbation, a lightweight transported outer update, and amplification along the retain-neutral direction. Our analysis establishes (i) a curvature-controlled second-order bound on retain damage, (ii) a positive-alignment regime in which ROSU strictly reduces surrogate retain loss relative to standard min-max perturbations, and (iii) near-equivalence when the two gradients are nearly orthogonal. Across vision and language benchmarks (CIFAR-10/100, Tiny-ImageNet, TOFU, WMDP), the empirical pattern follows this geometry: ROSU gives its clearest gains in high-coupling regimes while remaining competitive elsewhere.

View on arXiv PDF

Similar