ML LGOct 17, 2022

Forget Unlearning: Towards True Data-Deletion in Machine Learning

arXiv:2210.08911v228.681 citationsh-index: 44

Originality Highly original

AI Analysis

This addresses privacy concerns for users in machine learning systems by highlighting critical flaws in existing unlearning approaches and offering a more secure solution, representing a foundational shift rather than an incremental improvement.

The paper tackles the problem of flawed privacy guarantees in machine unlearning algorithms, showing that current methods fail to protect deleted data's privacy due to interdependencies and information leakage, and proposes a new algorithm with a sound deletion guarantee based on noisy gradient descent.

Unlearning algorithms aim to remove deleted data's influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don't protect the privacy of deleted records. We show that when users delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn't ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an accurate, computationally efficient, and secure machine unlearning algorithm based on noisy gradient descent.

View on arXiv PDF

Similar