LGAIOct 2, 2022

Grouped self-attention mechanism for a memory-efficient Transformer

arXiv:2210.00440v24 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses memory efficiency for time-series forecasting tasks, but it is incremental as it builds on existing Transformer architectures.

The paper tackled the problem of high computational complexity in Transformers for long time-series data by proposing Grouped Self-Attention and Compressed Cross-Attention modules, achieving O(l) complexity and performance comparable to or better than existing methods.

Time-series data analysis is important because numerous real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. Thus, capturing long-range dependency is an important factor in time-series data forecasting. To solve these problems, we proposed two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA). With both modules, we achieved a computational space and time complexity of order $O(l)$ with a sequence length $l$ under small hyperparameter limitations, and can capture locality while considering global information. The results of experiments conducted on time-series datasets show that our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes