CVNov 23, 2024

MUNBa: Machine Unlearning via Nash Bargaining

arXiv:2411.15537v413.516 citationsh-index: 29Has Code

Originality Highly original

AI Analysis

This work provides a more effective solution for machine unlearning, which is important for AI safety and privacy, though it is incremental as it builds on existing unlearning methods by optimizing gradient handling.

The paper tackled the problem of machine unlearning by addressing gradient conflicts between forgetting harmful behaviors and preserving model utility, reformulating it as a cooperative game using Nash bargaining theory to achieve a Pareto stationary point. The method outperformed state-of-the-art algorithms on tasks like image classification and generation, showing improvements in forgetting precision, generalization preservation, and robustness.

Machine Unlearning (MU) aims to selectively erase harmful behaviors from models while retaining the overall utility of the model. As a multi-task learning problem, MU involves balancing objectives related to forgetting specific concepts/data and preserving general performance. A naive integration of these forgetting and preserving objectives can lead to gradient conflicts and dominance, impeding MU algorithms from reaching optimal solutions. To address the gradient conflict and dominance issue, we reformulate MU as a two-player cooperative game, where the two players, namely, the forgetting player and the preservation player, contribute via their gradient proposals to maximize their overall gain and balance their contributions. To this end, inspired by the Nash bargaining theory, we derive a closed-form solution to guide the model toward the Pareto stationary point. Our formulation of MU guarantees an equilibrium solution, where any deviation from the final state would lead to a reduction in the overall objectives for both players, ensuring optimality in each objective. We evaluate our algorithm's effectiveness on a diverse set of tasks across image classification and image generation. Extensive experiments with ResNet, vision-language model CLIP, and text-to-image diffusion models demonstrate that our method outperforms state-of-the-art MU algorithms, achieving a better trade-off between forgetting and preserving. Our results also highlight improvements in forgetting precision, preservation of generalization, and robustness against adversarial attacks.

View on arXiv PDF Code

Similar