LGAICLNov 27, 2024

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?

arXiv:2411.18797v210 citationsh-index: 20ACL
Originality Incremental advance
AI Analysis

This addresses the unlearning problem for users of MoE LLMs, offering a novel solution to a previously unexplored domain, though it builds incrementally on standard unlearning algorithms.

The paper tackles the problem of effectively and efficiently unlearning unwanted knowledge from sparse Mixture-of-Experts (MoE) LLMs, which face challenges like excessive forgetting and utility drops with existing methods, and proposes the Selected-Expert Unlearning Framework (SEUF) that improves forget quality by up to 5% and model utility by 35% while unlearning only 0.06% of parameters.

Recent advancements in LLMs unlearning have shown remarkable success in removing unwanted data-model influences while preserving the model's utility for legitimate knowledge. Despite these strides, sparse Mixture-of-Experts (MoE) LLMs--a key subset of the LLM family--have remained unexplored in the context of unlearning. As MoE LLMs are celebrated for their exceptional performance, we ask:How can unlearning be performed effectively and efficiently on MoE LLMs? Our pilot study shows that the dynamic routing nature of MoE LLMs introduces unique challenges, leading to excessive forgetting, uncontrolled knowledge erasure and substantial utility drops when existing unlearning methods are applied. To address this, we propose a novel Selected-Expert Unlearning Framework (SEUF). Through expert attribution, unlearning is concentrated on the most actively engaged experts for the specified knowledge. Concurrently, an anchor loss is applied to the router to stabilize the active state of this targeted expert, ensuring focused and controlled unlearning. SEUF is compatible with various standard unlearning algorithms. Extensive experiments demonstrate that SEUF enhances both forget quality up to 5% and model utility by 35% on MoE LLMs across various benchmarks and LLM architectures (compared to standard unlearning algorithms), while only unlearning 0.06% of the model parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes