Protecting the Undeleted in Machine Unlearning
This work highlights a critical security flaw in machine unlearning for data privacy, proposing a new definition to protect undeleted data, which is foundational for secure AI systems.
The paper demonstrates that machine unlearning approaches aiming for perfect retraining pose privacy risks for undeleted data, allowing reconstruction attacks that can recover almost the entire dataset with minimal adversarial control.
Machine unlearning aims to remove specific data points from a trained model, often striving to emulate "perfect retraining", i.e., producing the model that would have been obtained had the deleted data never been included. We demonstrate that this approach, and security definitions that enable it, carry significant privacy risks for the remaining (undeleted) data points. We present a reconstruction attack showing that for certain tasks, which can be computed securely without deletions, a mechanism adhering to perfect retraining allows an adversary controlling merely $ω(1)$ data points to reconstruct almost the entire dataset merely by issuing deletion requests. We survey existing definitions for machine unlearning, showing they are either susceptible to such attacks or too restrictive to support basic functionalities like exact summation. To address this problem, we propose a new security definition that specifically safeguards undeleted data against leakage caused by the deletion of other points. We show that our definition permits several essential functionalities, such as bulletin boards, summations, and statistical learning.