LG MLNov 8, 2019

Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten

arXiv:1911.03030v639.4668 citationsHas Code

Originality Highly original

AI Analysis

This addresses data privacy and compliance issues for users and organizations, offering a foundational approach to data removal in ML.

The paper tackles the problem of removing data from trained machine learning models to comply with data ownership requests, proposing certified removal as a theoretical guarantee that ensures a model after removal is indistinguishable from one never trained on that data, with empirical results showing practicality in linear classifier settings.

Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

View on arXiv PDF Code

Similar