CLAICYLGNov 23, 2024

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Microsoft
arXiv:2411.15477v129 citationsh-index: 15EMNLP
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of ensuring robust forgetting of sensitive information in LLMs, which is crucial for privacy and compliance, but it is incremental as it focuses on improving evaluation rather than proposing new unlearning methods.

The paper tackled the problem of evaluating machine unlearning techniques in LLMs by testing their robustness to data transformations, finding that existing methods may fail to prevent recall of forgotten information when input formats change, highlighting the need for diverse data formats in evaluation.

Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII). Consequently, in recent times research in the area of Machine Unlearning (MUL) has become active, the main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks. In this work, we examine the robustness of the existing MUL techniques for their ability to enable leakage-proof forgetting in LLMs. In particular, we examine the effect of data transformation on forgetting, i.e., is an unlearned LLM able to recall forgotten information if there is a change in the format of the input? Our findings on the TOFU dataset highlight the necessity of using diverse data formats to quantify unlearning in LLMs more reliably.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes