DSAILGMay 27

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

arXiv:2605.2912126.5h-index: 9
Predicted impact top 47% in DS · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a theoretical mechanism for abrupt load imbalance in MoE routers, which is relevant for understanding and potentially mitigating routing collapse in large-scale models.

The authors propose a minimal dynamical model of adaptive softmax routing in a two-expert Mixture-of-Experts layer, showing that load imbalance arises via a supercritical pitchfork bifurcation. They derive exact bifurcation equations and validate the model with numerical experiments, including a small classification task on digits.

We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as a mean-field limit of a discrete reinforcement rule: the selected expert receives a small score increment, while all scores undergo regularizing decay. In the symmetric case the limiting system has a supercritical pitchfork bifurcation: for weak feedback there is a unique stable balanced state, whereas above a critical feedback strength two stable asymmetric states appear. When an external asymmetry is added, the pitchfork unfolds into a pair of fold bifurcations forming a cusp in the control-parameter plane. We derive exact parametric equations for the bifurcation set and the local normal form of the cusp catastrophe. Numerical experiments connect this picture to empirical expert load, a small trainable MoE model, hard top-1 PyTorch routing, and a small classification experiment on digits. The results provide a controlled low-dimensional mechanism for abrupt transitions to load imbalance in adaptive MoE routers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes