Provable Unlearning with Gradient Ascent on Two-Layer ReLU Neural Networks
This addresses privacy and ethical concerns in AI by enabling efficient data removal without full retraining, though it is incremental as it builds on existing gradient ascent methods.
The paper tackles the problem of machine unlearning by analyzing gradient ascent as a method to remove specific data points from trained models, showing that it satisfies a new success criterion and approximates the retrained solution on retained data, with theoretical guarantees for linear models and two-layer neural networks.
Machine Unlearning aims to remove specific data from trained models, addressing growing privacy and ethical concerns. We provide a theoretical analysis of a simple and widely used method - gradient ascent - used to reverse the influence of a specific data point without retraining from scratch. Leveraging the implicit bias of gradient descent towards solutions that satisfy the Karush-Kuhn-Tucker (KKT) conditions of a margin maximization problem, we quantify the quality of the unlearned model by evaluating how well it satisfies these conditions w.r.t. the retained data. To formalize this idea, we propose a new success criterion, termed \textbf{$(ε, δ, τ)$-successful} unlearning, and show that, for both linear models and two-layer neural networks with high dimensional data, a properly scaled gradient-ascent step satisfies this criterion and yields a model that closely approximates the retrained solution on the retained data. We also show that gradient ascent performs successful unlearning while still preserving generalization in a synthetic Gaussian-mixture setting.