CLAIOct 28, 2025

Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

arXiv:2510.23949v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses a critical evaluation blind spot for researchers and practitioners working on unlearning in multilingual AI systems, though it is incremental as it builds on prior studies.

The paper tackles the problem of language confusion in multilingual large language models during unlearning, where models respond in a different language than the input prompt, and shows that this causes standard reference-based metrics to fail, introducing an N-gram-based score to quantify the issue.

There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain highly performance-oriented. In this paper, we switch the point of view to evaluation, and address an additional blind spot which reveals itself when the multilingual LLM is fully finetuned with parallel multilingual dataset before unlearning. Here, language confusion occurs whereby a model responds in language different from that of the input prompt. Language confusion is a problematic phenomenon in unlearning, causing the standard reference-based metrics to fail. We tackle this phenomenon in three steps: (1) introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs, (2) demonstrate that reference-based metrics result in false negatives when N-Mix score is high, and(3) suggest the need of new type of unlearning evaluation that can directly assess the content of the generated sentences. We call this type of metrics as semantic-based metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes