Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
For practitioners needing accurate and efficient time series forecasting, this work offers a practical improvement over existing neural methods, though it is an incremental extension of MoE techniques.
The paper introduces a Mixture-of-Experts framework for time series forecasting that incorporates expert-specific losses into training, combined with partial online learning, achieving improved accuracy and computational efficiency over Transformers and WaveNet across multiple datasets.
We propose a novel adaptive Mixture-of-Experts (MoE) framework for time series forecasting that enhances expert specialization by incorporating expert-specific loss information directly into the training process. Notably, the overall objective comprises the base forecasting loss and expert-specific losses, allowing expert-level prediction errors to jointly shape training alongside the global forecasting loss. This framework is further combined with a partial online learning strategy, enabling incremental updates of both the gating mechanism and expert parameters. This approach significantly reduces computational cost by eliminating the need for repeated full model retraining. By integrating expert-level loss awareness with efficient online optimization, the proposed method achieves improved learning efficiency while maintaining strong predictive performance. Empirical results across economic, tourism, and energy datasets with varying frequencies demonstrate that the proposed approach generally outperforms both statistical methods and state-of-the-art neural network models, such as Transformers and WaveNet, in forecasting accuracy and computational efficiency. Furthermore, ablation studies confirm the effectiveness of the expert-specific loss integration strategy, highlighting its contribution to enhancing predictive performance.