LGMay 16, 2024

Data Selection for Transfer Unlearning

Nazanin Mohammadi Sepahvand, Vincent Dumoulin, Eleni Triantafillou, Gintare Karolina Dziugaite

arXiv:2405.10425v19.25 citationsh-index: 27

Originality Highly original

AI Analysis

This addresses the scenario where data owners withdraw permission for training data use, offering an efficient solution for machine unlearning in transfer learning contexts.

The paper tackles the problem of transfer unlearning, where a pretrained model must adapt to a target dataset containing data that may later need to be unlearned, by proposing a method that selects relevant examples from an auxiliary static dataset for finetuning, which outperforms the gold standard approach, especially with small static sets, sometimes nearing an upper bound for test accuracy.

As deep learning models are becoming larger and data-hungrier, there are growing ethical, legal and technical concerns over use of data: in practice, agreements on data use may change over time, rendering previously-used training data impermissible for training purposes. These issues have driven increased attention to machine unlearning: removing "the influence of" a subset of training data from a trained model. In this work, we advocate for a relaxed definition of unlearning that does not address privacy applications but targets a scenario where a data owner withdraws permission of use of their data for training purposes. In this context, we consider the important problem of \emph{transfer unlearning} where a pretrained model is transferred to a target dataset that contains some "non-static" data that may need to be unlearned in the future. We propose a new method that uses a mechanism for selecting relevant examples from an auxiliary "static" dataset, and finetunes on the selected data instead of "non-static" target data; addressing all unlearning requests ahead of time. We also adapt a recent relaxed definition of unlearning to our problem setting and demonstrate that our approach is an exact transfer unlearner according to it, while being highly efficient (amortized). We find that our method outperforms the gold standard "exact unlearning" (finetuning on only the "static" portion of the target dataset) on several datasets, especially for small "static" sets, sometimes approaching an upper bound for test accuracy. We also analyze factors influencing the accuracy boost obtained by data selection.

View on arXiv PDF

Similar