LG AIFeb 3

Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures

Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, Albert No

arXiv:2602.03379v13.82 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the fragility of unlearning methods for machine learning models, which is an incremental improvement in a domain-specific area.

The paper tackles the problem of benign relearning in machine unlearning, where forgotten information reemerges during fine-tuning, and finds that syntactic similarity, not topical relevance, is the primary driver. The result is a syntactic diversification method that suppresses relearning, accelerates forgetting, and reduces the trade-off between unlearning efficacy and model utility.

Machine unlearning aims to remove specific content from trained models while preserving overall performance. However, the phenomenon of benign relearning, in which forgotten information reemerges even from benign fine-tuning data, reveals that existing unlearning methods remain fundamentally fragile. A common explanation attributes this effect to topical relevance, but we find this account insufficient. Through systematic analysis, we demonstrate that syntactic similarity, rather than topicality, is the primary driver: across benchmarks, syntactically similar data consistently trigger recovery even without topical overlap, due to their alignment in representations and gradients with the forgotten content. Motivated by this insight, we introduce syntactic diversification, which paraphrases the original forget queries into heterogeneous structures prior to unlearning. This approach effectively suppresses benign relearning, accelerates forgetting, and substantially alleviates the trade-off between unlearning efficacy and model utility.

View on arXiv PDF

Similar