CRAICLLGOct 30, 2024

Stealing User Prompts from Mixture of Experts

DeepMind
arXiv:2410.22884v111 citationsh-index: 35
Originality Highly original
AI Analysis

This introduces a new class of LLM vulnerabilities for users of MoE models, though it is incremental as it builds on known routing mechanisms.

The paper tackles the problem of extracting user prompts from Mixture-of-Experts (MoE) models by exploiting Expert-Choice-Routing vulnerabilities, achieving full disclosure of a victim's prompt with O(VM^2) queries or about 100 queries per token on average in a two-layer Mixtral model.

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using $O({VM}^2)$ queries (with vocabulary size $V$ and prompt length $M$) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes