CLCRFeb 4

Expert Selections In MoE Models Reveal (Almost) As Much As Text

arXiv:2602.04105v13 citations
Originality Incremental advance
AI Analysis

This reveals a security vulnerability in MoE deployments, such as distributed inference, where routing data could leak sensitive text, making it an incremental but important finding for privacy in AI systems.

The paper tackles the problem of information leakage in mixture-of-experts (MoE) language models by showing that expert routing decisions can be exploited to reconstruct text tokens, achieving up to 91.2% top-1 accuracy on 32-token sequences from OpenWebText.

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes