S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting
This work improves time series forecasting for applications requiring robust predictions, though it is incremental by combining existing methods.
The paper tackled the problem of multivariate time series forecasting by addressing independent processing of variates and separate learning of multi-scale representations, proposing S2TX which integrates a Mamba model and Transformer with cross-attention to achieve state-of-the-art results on seven benchmark datasets with low memory usage.
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as local/global communications. Comprehensive experiments on seven classic long-short range time-series forecasting benchmark datasets demonstrate that S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.