CVFeb 27, 2024

Interactive Multi-Head Self-Attention with Linear Complexity

arXiv:2402.17507v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck in attention mechanisms for machine learning practitioners, offering an incremental improvement over prior efficient methods.

The paper tackles the computational inefficiency of capturing interactions between cross-heads in multi-head self-attention by proposing a decomposition method that reduces attention matrix size, resulting in favorable performance compared to existing efficient attention methods and state-of-the-art models.

We propose an efficient interactive method for multi-head self-attention via decomposition. For existing methods using multi-head self-attention, the attention operation of each head is computed independently. However, we show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation. Considering that the attention matrix of each head can be seen as a feature of networks, it is beneficial to establish connectivity between them to capture interactions better. However, a straightforward approach to capture the interactions between the cross-heads is computationally prohibitive as the complexity grows substantially with the high dimension of an attention matrix. In this work, we propose an effective method to decompose the attention operation into query- and key-less components. This will result in a more manageable size for the attention matrix, specifically for the cross-head interactions. Expensive experimental results show that the proposed cross-head interaction approach performs favorably against existing efficient attention methods and state-of-the-art backbone models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes