Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting
This work addresses the problem of slow inference in long sequence forecasting for researchers and practitioners, offering an incremental improvement over existing methods like Informer.
The authors tackled the computational inefficiency of Informer for long sequence time-series forecasting by proposing FWin, a fast local-global window-based attention method that accelerates inference speeds by 1.6 to 2 times while improving prediction accuracies on univariate and multivariate datasets.
We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention being local is a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 1.6 to 2 times. We also provide a mathematical definition of FWin attention, and prove that it is equivalent to the canonical full attention under the block diagonal invertibility (BDI) condition of the attention matrix. The BDI is shown experimentally to hold with high probability for typical benchmark datasets.