LG AIOct 2, 2022

Grouped self-attention mechanism for a memory-efficient Transformer

Bumjun Jung, Yusuke Mukuta, Tatsuya Harada

arXiv:2210.00440v24.64 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses memory efficiency for time-series forecasting tasks, but it is incremental as it builds on existing Transformer architectures.

The paper tackled the problem of high computational complexity in Transformers for long time-series data by proposing Grouped Self-Attention and Compressed Cross-Attention modules, achieving O(l) complexity and performance comparable to or better than existing methods.

Time-series data analysis is important because numerous real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. Thus, capturing long-range dependency is an important factor in time-series data forecasting. To solve these problems, we proposed two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA). With both modules, we achieved a computational space and time complexity of order $O(l)$ with a sequence length $l$ under small hyperparameter limitations, and can capture locality while considering global information. The results of experiments conducted on time-series datasets show that our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.

View on arXiv PDF

Similar