When unlearning is free: leveraging low influence points to reduce computational costs
This work addresses computational efficiency for practitioners implementing data privacy measures like unlearning, though it is incremental as it builds on existing unlearning methods by optimizing their application.
The paper tackles the problem of computational costs in machine unlearning by identifying low-influence data points that have negligible impact on model outputs, and proposes an efficient framework that reduces dataset size before unlearning, achieving up to 50% computational savings in real-world examples.
As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art unlearning methods have emerged in response, they typically treat all points in the forget set equally. In this work, we challenge this approach by asking whether points that have a negligible impact on the model's learning need to be removed. Through a comparative analysis of influence functions across language and vision tasks, we identify subsets of training data with negligible impact on model outputs. Leveraging this insight, we propose an efficient unlearning framework that reduces the size of datasets before unlearning leading to significant computational savings (up to approximately 50 percent) on real world empirical examples.