AICLLGApr 20

Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

arXiv:2604.1783793.8h-index: 2
AI Analysis

This work provides a new interpretability lens for MoE models, identifying that the natural unit of interpretability is the trajectory rather than the expert, which could help understand and improve large-scale MoE LLMs.

The paper introduces a parameter-free decomposition for Mixture-of-Experts models that separates each layer's hidden state into a control signal driving routing and an orthogonal content channel. Across six MoE architectures, they find that control signals encode abstract functions that rotate across layers, and expert paths become monosemantic, clustering tokens by semantic function, while individual experts remain polysemantic.

An LLM's residual stream is both state and instruction: it encodes the current context and determines the next transformation. We introduce a parameter-free decomposition for Mixture-of-Experts models that splits each layer's hidden state into a control signal that causally drives routing and an orthogonal content channel invisible to the router. Across six MoE architectures, we find that models preserve surface-level features (language, token identity, position) in the content channel, while the control signal encodes an abstract function that rotates from layer to layer. Because each routing decision is low-bandwidth, this hand-off forces compositional specialization across layers. While individual experts remain polysemantic, expert paths become monosemantic, clustering tokens by semantic function across languages and surface forms. The same token (e.g., ":") follows distinct trajectories depending on whether it serves as a type annotation, an introductory colon, or a time separator. Our decomposition identifies the source of this structure: clusters in the control subspace are substantially more monosemantic than those in the full representation. As a result, the natural unit of interpretability in MoEs is not the expert but the trajectory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes