LG AIDec 7, 2023

Graph Convolutions Enrich the Self-Attention in Transformers!

Jeongwhan Choi, Hyowon Wi, Jayoung Kim, Yehjin Shin, Kookjin Lee, Nathaniel Trask, Noseong Park

arXiv:2312.04234v513.720 citationsh-index: 12Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses a critical bottleneck in Transformers for researchers and practitioners in AI, though it is incremental as it builds on existing self-attention mechanisms.

The paper tackled the oversmoothing problem in deep Transformer models, where representations become indistinguishable across layers, by proposing a graph-filter-based self-attention (GFSA) that improves performance across multiple fields such as computer vision, natural language processing, and speech recognition.

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.

View on arXiv PDF Code

Similar