NIMay 26

Birkhoff Decompositions and Photonic Interconnects Wait! Don't Forget the Compute!

arXiv:2605.2684519.2

AI Analysis

For practitioners deploying MoE models with photonic interconnects, this work addresses a practical bottleneck in communication-compute overlap, offering a simple yet effective scheduling improvement.

The paper identifies that Birkhoff–von Neumann decomposition for scheduling all-to-all communication in Mixture-of-Experts models leads to scheduling bubbles and compute inefficiencies due to non-doubly-stochastic matrices and excessive matchings. A greedy max-weight decomposition strategy is proposed, which reduces the number of matchings, improves overlap efficiency, and approaches ideal congestion-free performance.

The growing demand for efficient communication in distributed training and inference has sparked significant interest in reconfigurable photonic interconnects across both academia and industry. Mixture-of-Experts (MoE) models, with their highly skewed communication patterns, present a natural opportunity for such circuit-switched fabrics. However, existing approaches largely optimize communication in isolation, overlooking the interaction between communication and the expert computation that follows. In this paper, we revisit circuit scheduling for all-to-all communication in MoE execution. We show that the dispatch--compute--combine structure fundamentally challenges classical scheduling techniques such as Birkhoff--von Neumann (BvN) decomposition. First, MoE communication matrices are rarely doubly stochastic, introducing significant scheduling bubbles in BvN-based schedules. Second, while decomposition enables communication--compute overlap, the excessive number of matchings produced by BvN fragments execution into small batches, leading to severe compute inefficiencies due to fixed execution overheads. Motivated by these observations, we explore a simple greedy max-weight decomposition strategy that bounds the number of matchings while preserving large batch sizes per matching. Despite its simplicity, the approach significantly improves overlap efficiency, reduces compute overheads, and approaches the performance of an ideal congestion-free all-to-all.

View on arXiv PDF

Similar