On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
This addresses privacy leakage in machine unlearning for models that need to forget specific classes, offering an incremental improvement over existing methods.
The paper identifies a privacy vulnerability in class unlearning evaluations due to overlooked class geometry and proposes a fine-tuning objective called Tilted ReWeighting (TRW) to mitigate this, reducing gaps with retrained models by 19% and 46% on CIFAR-10 for specific metrics.
In this paper, we reveal a significant shortcoming in class unlearning evaluations: overlooking the underlying class geometry can cause privacy leakage. We further propose a simple yet effective solution to mitigate this issue. We introduce a membership-inference attack via nearest neighbors (MIA-NN) that uses the probabilities the model assigns to neighboring classes to detect unlearned samples. Our experiments show that existing unlearning methods are vulnerable to MIA-NN across multiple datasets. We then propose a new fine-tuning objective that mitigates this privacy leakage by approximating, for forget-class inputs, the distribution over the remaining classes that a retrained-from-scratch model would produce. To construct this approximation, we estimate inter-class similarity and tilt the target model's distribution accordingly. The resulting Tilted ReWeighting (TRW) distribution serves as the desired distribution during fine-tuning. We also show that across multiple benchmarks, TRW matches or surpasses existing unlearning methods on prior unlearning metrics. More specifically, on CIFAR-10, it reduces the gap with retrained models by 19% and 46% for U-LiRA and MIA-NN scores, accordingly, compared to the SOTA method for each category.