Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting
This work solves the problem of accurate long-term forecasting for researchers and industries relying on time series data, representing an incremental advance by integrating U-Net connections with sparse attention.
The paper tackles far horizon time series forecasting by proposing Yformer, a U-Net inspired transformer architecture that addresses the resolution mismatch in sparse transformers, resulting in average improvements of 19.82% MSE and 13.62% MAE in univariate settings and 18.41% MSE and 11.85% MAE in multivariate settings compared to state-of-the-art methods.
Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively.