LGCRFeb 28, 2022

Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

arXiv:2202.13585v149 citations
Originality Incremental advance
AI Analysis

This addresses the need for service providers and model maintainers to respect privacy rights and mitigate adversarial attacks by enabling selective data removal, though it appears incremental as it builds on existing unlearning concepts with a new method.

The paper tackles the problem of efficiently removing the effect of specific subsets of training data from machine learning models, such as adversarial data or user data for privacy, without full retraining. It proposes a Markov chain Monte Carlo-based unlearning algorithm (MCU) that achieves desirable performance by outperforming an existing method on real-world phishing and diabetes datasets.

As the use of machine learning (ML) models is becoming increasingly popular in many real-world applications, there are practical challenges that need to be addressed for model maintenance. One such challenge is to 'undo' the effect of a specific subset of dataset used for training a model. This specific subset may contain malicious or adversarial data injected by an attacker, which affects the model performance. Another reason may be the need for a service provider to remove data pertaining to a specific user to respect the user's privacy. In both cases, the problem is to 'unlearn' a specific subset of the training data from a trained model without incurring the costly procedure of retraining the whole model from scratch. Towards this goal, this paper presents a Markov chain Monte Carlo-based machine unlearning (MCU) algorithm. MCU helps to effectively and efficiently unlearn a trained model from subsets of training dataset. Furthermore, we show that with MCU, we are able to explain the effect of a subset of a training dataset on the model prediction. Thus, MCU is useful for examining subsets of data to identify the adversarial data to be removed. Similarly, MCU can be used to erase the lineage of a user's personal data from trained ML models, thus upholding a user's "right to be forgotten". We empirically evaluate the performance of our proposed MCU algorithm on real-world phishing and diabetes datasets. Results show that MCU can achieve a desirable performance by efficiently removing the effect of a subset of training dataset and outperform an existing algorithm that utilizes the remaining dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes