LGOct 27, 2021

Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

arXiv:2110.14416v248 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the problem of efficiently processing higher-order data structures for researchers and practitioners in machine learning, offering a novel extension of Transformers with competitive or superior results compared to invariant MLPs and message-passing GNNs.

The authors generalized Transformers to handle permutation-invariant data like sets, graphs, and hypergraphs, achieving significant performance improvements over existing methods in large-scale tasks such as graph regression and set-to-(hyper)graph prediction.

We present a generalization of Transformers to any-order permutation invariant data (sets, graphs, and hypergraphs). We begin by observing that Transformers generalize DeepSets, or first-order (set-input) permutation invariant MLPs. Then, based on recently characterized higher-order invariant MLPs, we extend the concept of self-attention to higher orders and propose higher-order Transformers for order-$k$ data ($k=2$ for graphs and $k>2$ for hypergraphs). Unfortunately, higher-order Transformers turn out to have prohibitive complexity $\mathcal{O}(n^{2k})$ to the number of input nodes $n$. To address this problem, we present sparse higher-order Transformers that have quadratic complexity to the number of input hyperedges, and further adopt the kernel attention approach to reduce the complexity to linear. In particular, we show that the sparse second-order Transformers with kernel attention are theoretically more expressive than message passing operations while having an asymptotically identical complexity. Our models achieve significant performance improvement over invariant MLPs and message-passing graph neural networks in large-scale graph regression and set-to-(hyper)graph prediction tasks. Our implementation is available at https://github.com/jw9730/hot.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes