LGMay 20, 2024

Distinguished In Uniform: Self Attention Vs. Virtual Nodes

arXiv:2405.11951v120 citationsh-index: 13Has CodeICLR
Originality Synthesis-oriented
AI Analysis

This work addresses theoretical expressivity comparisons for graph processing models, which is incremental as it builds on prior universality results.

The paper clarifies that graph transformers are not uniquely universal approximators, showing that other architectures like MPGNNs and MLPs also have non-uniform expressivity under certain conditions, and proves that neither graph transformers nor MPGNNs with virtual nodes are uniformly universal or subsume each other's expressivity, with experiments on synthetic and real-world data showing no clear practical ranking.

Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes