Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function
This work addresses the challenge of deploying distributed MoE systems in wireless environments, which is an incremental improvement for edge computing applications.
The paper tackles the problem of adapting distributed mixture-of-experts (MoE) systems to wireless networks by introducing a channel-aware gating function that incorporates channel conditions into expert selection, and it shows that this approach outperforms traditional MoE models.
In a distributed mixture-of-experts (MoE) system, a server collaborates with multiple specialized expert clients to perform inference. The server extracts features from input data and dynamically selects experts based on their areas of specialization to produce the final output. Although MoE models are widely valued for their flexibility and performance benefits, adapting distributed MoEs to operate effectively in wireless networks has remained unexplored. In this work, we introduce a novel channel-aware gating function for wireless distributed MoE, which incorporates channel conditions into the MoE gating mechanism. To train the channel-aware gating, we simulate various signal-to-noise ratios (SNRs) for each expert's communication channel and add noise to the features distributed to the experts based on these SNRs. The gating function then utilizes both features and SNRs to optimize expert selection. Unlike conventional MoE models which solely consider the alignment of features with the specializations of experts, our approach additionally considers the impact of channel conditions on expert performance. Experimental results demonstrate that the proposed channel-aware gating scheme outperforms traditional MoE models.