LG CR OCDec 12, 2024

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo

arXiv:2412.09119v218.217 citationsh-index: 39

Originality Highly original

AI Analysis

This work addresses privacy and knowledge gaps in deployed models, providing formal guarantees for unlearning, though it is incremental in improving theoretical understanding and methods.

The paper tackles the problem of machine unlearning by analyzing the trade-offs between utility, time, and space complexity, showing that a simple empirical risk minimization method works well for in-distribution data but fails for out-of-distribution data, where they propose a new gradient descent variant to reduce time complexity.

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

View on arXiv PDF

Similar