LGAIMay 27, 2023

Graph Inductive Biases in Transformers without Message Passing

arXiv:2305.17589v1197 citations
Originality Incremental advance
AI Analysis

This work addresses a key problem in graph machine learning by enabling effective Graph Transformers without message-passing, potentially benefiting researchers and practitioners in fields like social network analysis and bioinformatics, though it is incremental as it builds on existing Transformer architectures.

The paper tackles the challenge of incorporating graph inductive biases into Transformers without using message-passing modules, which can cause issues like over-smoothing and hinder transferability, by proposing GRIT, a Graph Transformer that achieves state-of-the-art performance across various datasets.

Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes