LGMay 29

Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning

arXiv:2605.3131765.5

AI Analysis

This work addresses a critical, previously overlooked failure mode in machine unlearning, localized collateral forgetting, which affects the reliability and fairness of unlearned models for practitioners deploying them.

This paper investigates localized failures in machine unlearning, where the unlearned model's predictions diverge from a retrained model, particularly for examples geometrically close to the forgotten data. They identify that inconsistent surrogate targets during unlearning propagate through shared representations, causing this "localized collateral forgetting." Their proposed Local Teacher Distillation method, using soft labels from a teacher trained on retained neighbors, significantly reduces this discrepancy on CIFAR-100 partial-class deletion.

Machine unlearning aims to remove the influence of selected training examples without full retraining. Standard evaluations often summarize unlearning quality with aggregate metrics, such as accuracy- and forgetting-based scores, which can hide localized failures. We study this failure mode at the example level by comparing the predictions of an unlearned model to those of the model retrained after deletion. We show that this pointwise discrepancy can be highly non-uniform: for gradient-ascent and random-labeling methods, with and without retain-set fine-tuning, it grows with geometric proximity to the forget set. We call this phenomenon localized collateral forgetting. Our analysis identifies a mechanism behind the effect: surrogate targets used during unlearning can be inconsistent with the local prediction structure induced by retraining, and this inconsistency propagates through shared representations to nearby examples. Motivated by this mechanism, we propose Local Teacher Distillation, a simple mitigation strategy that replaces random targets with soft labels from a small teacher trained only on retained neighbors of the forget set. On CIFAR-100 partial-class deletion, this local teacher brings the unlearned model substantially closer to retraining, especially near the forget set, while maintaining competitive aggregate unlearning metrics.

View on arXiv PDF

Similar