LGAIMay 13, 2025

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

arXiv:2505.08516v14 citationsh-index: 12IJCAI
Originality Highly original
AI Analysis

This addresses the problem of inefficient frequency utilization in self-attention for researchers and practitioners in machine learning, offering a novel method with broad applicability.

The paper tackles the limitation of self-attention in Transformers as a low-pass filter by proposing Attentive Graph Filter (AGF), which learns graph filters in the singular value domain with linear complexity, achieving state-of-the-art performance on tasks like Long Range Arena and time series classification.

Transformers have demonstrated remarkable performance across diverse domains. The key component of Transformers is self-attention, which learns the relationship between any two tokens in the input sequence. Recent studies have revealed that the self-attention can be understood as a normalized adjacency matrix of a graph. Notably, from the perspective of graph signal processing (GSP), the self-attention can be equivalently defined as a simple graph filter, applying GSP using the value vector as the signal. However, the self-attention is a graph filter defined with only the first order of the polynomial matrix, and acts as a low-pass filter preventing the effective leverage of various frequency information. Consequently, existing self-attention mechanisms are designed in a rather simplified manner. Therefore, we propose a novel method, called \underline{\textbf{A}}ttentive \underline{\textbf{G}}raph \underline{\textbf{F}}ilter (AGF), interpreting the self-attention as learning the graph filter in the singular value domain from the perspective of graph signal processing for directed graphs with the linear complexity w.r.t. the input length $n$, i.e., $\mathcal{O}(nd^2)$. In our experiments, we demonstrate that AGF achieves state-of-the-art performance on various tasks, including Long Range Arena benchmark and time series classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes