LG CL MLDec 10, 2021

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Yifan Chen, Qi Zeng, Dilek Hakkani-Tur, Di Jin, Heng Ji, Yun Yang

arXiv:2112.05359v151.5629 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the computational bottleneck in processing long sequences for Transformer models, offering a practical improvement for applications like natural language processing, though it is incremental relative to prior methods like Linformer and Informer.

The paper tackles the inefficiency of Transformer self-attention for long sequences by proposing Skeinformer, which accelerates self-attention and improves accuracy through matrix sketching techniques, achieving better performance on the Long Range Arena benchmark with reduced time and space usage.

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

View on arXiv PDF Code

Similar