LGNov 14, 2025

FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models

Yonatan Dukler, Guihong Li, Deval Shah, Vikram Appia, Emad Barsoum

arXiv:2511.11505v14.1h-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses efficiency bottlenecks in distributed training and inference for large-scale MoE models, representing an incremental improvement.

The paper tackles the problem of blocking communication in Mixture of Experts models by introducing FarSkip-Collective, which modifies model architecture to overlap computation with communication, achieving accuracy within 1% of original models for up to 109B parameters.

Blocking communication presents a major hurdle in running MoEs efficiently in distributed settings. To address this, we present FarSkip-Collective which modifies the architecture of modern models to enable overlapping of their computation with communication. Our approach modifies the architecture to skip connections in the model and it is unclear a priori whether the modified model architecture can remain as capable, especially for large state-of-the-art models and while modifying all of the model layers. We answer this question in the affirmative and fully convert a series of state-of-the-art models varying from 16B to 109B parameters to enable overlapping of their communication while achieving accuracy on par with their original open-source releases. For example, we convert Llama 4 Scout (109B) via self-distillation and achieve average accuracy within 1% of its instruction tuned release averaged across a wide range of downstream evaluations. In addition to demonstrating retained accuracy of the large modified models, we realize the benefits of FarSkip-Collective through optimized implementations that explicitly overlap communication with computation, accelerating both training and inference in existing frameworks.

View on arXiv PDF

Similar