LATST: Are Transformers Necessarily Complex for Time-Series Forecasting
This work addresses a key bottleneck in time-series forecasting for researchers and practitioners, offering an incremental improvement over existing Transformer models.
The paper tackles the underperformance of Transformer-based models in multivariate long-term time-series forecasting compared to simpler linear baselines by introducing LATST, a novel approach that mitigates entropy collapse and training instability, achieving competitive performance with fewer parameters on some datasets.
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.