Towards Evaluation for Real-World LLM Unlearning
This addresses the need for more practical and reliable evaluation methods for LLM unlearning, which is incremental as it improves upon existing metrics rather than introducing a new unlearning paradigm.
The paper tackles the problem of evaluating LLM unlearning in real-world scenarios by proposing a new metric called DCUE, which corrects distributional biases in token confidence scores and uses the Kolmogorov-Smirnov test for quantification, demonstrating that it overcomes limitations of existing metrics.
This paper analyzes the limitations of existing unlearning evaluation metrics in terms of practicality, exactness, and robustness in real-world LLM unlearning scenarios. To overcome these limitations, we propose a new metric called Distribution Correction-based Unlearning Evaluation (DCUE). It identifies core tokens and corrects distributional biases in their confidence scores using a validation set. The evaluation results are quantified using the Kolmogorov-Smirnov test. Experimental results demonstrate that DCUE overcomes the limitations of existing metrics, which also guides the design of more practical and reliable unlearning algorithms in the future.