XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
This work addresses the limitation of existing attention mechanisms in time series forecasting for applications requiring nonlinear dependency modeling, representing an incremental improvement.
The paper tackled the problem of capturing nonlinear dependencies in time series forecasting by proposing XicorAttention, a novel attention mechanism based on Chatterjee's rank correlation coefficient, which improved forecasting accuracy by up to 9.1% compared to existing models.
Various Transformer-based models have been proposed for time series forecasting. These models leverage the self-attention mechanism to capture long-term temporal or variate dependencies in sequences. Existing methods can be divided into two approaches: (1) reducing computational cost of attention by making the calculations sparse, and (2) reshaping the input data to aggregate temporal features. However, existing attention mechanisms may not adequately capture inherent nonlinear dependencies present in time series data, leaving room for improvement. In this study, we propose a novel attention mechanism based on Chatterjee's rank correlation coefficient, which measures nonlinear dependencies between variables. Specifically, we replace the matrix multiplication in standard attention mechanisms with this rank coefficient to measure the query-key relationship. Since computing Chatterjee's correlation coefficient involves sorting and ranking operations, we introduce a differentiable approximation employing SoftSort and SoftRank. Our proposed mechanism, ``XicorAttention,'' integrates it into several state-of-the-art Transformer models. Experimental results on real-world datasets demonstrate that incorporating nonlinear correlation into the attention improves forecasting accuracy by up to approximately 9.1\% compared to existing models.