LGFeb 20, 2025

TimeDistill: Efficient Long-Term Time Series Forecasting with MLP via Cross-Architecture Distillation

Juntong Ni, Zewen Liu, Shiyu Wang, Ming Jin, Wei Jin

arXiv:2502.15016v119 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses efficiency issues in time series forecasting for large-scale deployment, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the high computational and storage costs of transformer- and CNN-based methods for long-term time series forecasting by proposing TimeDistill, a cross-architecture knowledge distillation framework that transfers patterns from these models to lightweight MLPs, resulting in up to 18.6% performance improvement, 7x faster inference, and 130x fewer parameters.

Transformer-based and CNN-based methods demonstrate strong performance in long-term time series forecasting. However, their high computational and storage requirements can hinder large-scale deployment. To address this limitation, we propose integrating lightweight MLP with advanced architectures using knowledge distillation (KD). Our preliminary study reveals different models can capture complementary patterns, particularly multi-scale and multi-period patterns in the temporal and frequency domains. Based on this observation, we introduce TimeDistill, a cross-architecture KD framework that transfers these patterns from teacher models (e.g., Transformers, CNNs) to MLP. Additionally, we provide a theoretical analysis, demonstrating that our KD approach can be interpreted as a specialized form of mixup data augmentation. TimeDistill improves MLP performance by up to 18.6%, surpassing teacher models on eight datasets. It also achieves up to 7X faster inference and requires 130X fewer parameters. Furthermore, we conduct extensive evaluations to highlight the versatility and effectiveness of TimeDistill.

View on arXiv PDF

Similar