LGMLNov 8, 2019

Certified Data Removal from Machine Learning Models

arXiv:1911.03030v6652 citations
Originality Highly original
AI Analysis

This addresses data privacy and compliance issues for users and organizations, offering a foundational approach to data removal in ML.

The paper tackles the problem of removing data from trained machine learning models to comply with data ownership requests, proposing certified removal as a theoretical guarantee that ensures a model after removal is indistinguishable from one never trained on that data, with empirical results showing practicality in linear classifier settings.

Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes