LGJun 13, 2024

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera

arXiv:2406.09073v129.153 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating and advancing unlearning algorithms for machine learning practitioners, though it is incremental as it builds on existing unlearning research by focusing on benchmarking.

The paper presents findings from the first NeurIPS competition on unlearning, which attracted nearly 1,200 teams and generated novel solutions, with top-performing entries surpassing existing algorithms under a formal evaluation framework that measures forgetting quality and model utility.

We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.

View on arXiv PDF

Similar