LG AIMay 24

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

Rui Wang, Renhao Xue, Ray Razi, Huan Song, Hannah R. Marlowe

arXiv:2605.2516680.2

Predicted impact top 15% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners needing efficient and interpretable time series forecasting, AME-TS provides a method to leverage sparse computation without sacrificing accuracy, though improvements are incremental over existing MoE approaches.

AME-TS introduces a structure-guided sparse mixture-of-experts model for time series forecasting that aligns expert routing with interpretable temporal features. It outperforms existing foundation models at small scales and remains competitive at larger scales while activating fewer parameters, and shows more stable expert specialization during fine-tuning on the M5 dataset.

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series through a shared dense computation path despite substantial heterogeneity in temporal structure. Mixture-of-Experts (MoE) offers a natural alternative by enabling conditional computation, but standard MoE routing leaves expert specialization weakly identified and often unstable during downstream adaptation. We propose AME-TS, a structure-guided sparse time series foundation model that aligns expert routing with interpretable temporal structure. AME-TS first uses a lightweight regime predictor to estimate series-level descriptors, including forecastability, seasonality, trend, and sparsity, and maps them to a soft structural prior over experts. This series-level prior guides token-level routing during training, encouraging structure-aligned specialization. On the GIFT-Eval benchmark, AME-TS delivers a strong accuracy-efficiency tradeoff across model scales: it substantially outperforms existing time series foundation models at small model scales and remains competitive with the strongest models at larger scales, while activating substantially fewer parameters through sparse routing. We further show that AME-TS learns more interpretable routing geometry and substantially more stable expert specialization than standard MoE during fine-tuning on the M5 dataset. These results suggest that structure-aware routing is an effective and reliable way to realize the benefits of sparse expert models for time series forecasting.

View on arXiv PDF

Similar