Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap
For researchers studying MoE architectures, this work provides a geometric interpretation and diagnostic framework for understanding expert specialization, though the findings are primarily observational and incremental.
The paper identifies a structural asymmetry in Mixture-of-Experts (MoE) layers: experts show strong functional decorrelation (near-zero cross-expert Jacobian alignment) while their representations occupy distinct but partially overlapping subspaces. Routing sparsity is shown to be a key factor shaping this geometry, with top-k routing inducing sharper functional separation and larger subspace divergence compared to soft routing.
Mixture-of-Experts (MoE) architectures achieve scalable capacity through sparse routing, yet the geometric structure of expert specialization remains poorly understood. We introduce a unified Jacobian-PCA-Grassmann framework for analyzing MoE layers in both function space and representation space. Across pretrained MoE Transformers (Mistral, Qwen), we find a consistent structural asymmetry: experts exhibit strong functional decorrelation (consistently low, near-zero cross-expert Jacobian alignment) while their routed representations occupy distinct but partially overlapping subspaces. This indicates that functional decorrelation and representation overlap coexist rather than coincide in MoE specialization. Controlled routing experiments further indicate that routing sparsity appears to be a key factor shaping this geometry: top-k routing induces sharper functional separation and larger subspace divergence, whereas fully soft routing yields more entangled expert structure. Together, these results suggest a geometric interpretation in which MoE layers may be viewed as implementing locally decorrelated operators over overlapping submanifolds on a shared representation manifold, and provide a general diagnostic framework for studying conditional computation in modern Transformer architectures.