Cluster Attention for Graph Machine Learning
This addresses the problem of enhancing graph machine learning models for researchers and practitioners by providing a method that balances large receptive fields with graph-structure biases, though it is incremental as it builds on existing approaches.
The paper tackles the limited receptive field of Message Passing Neural Networks and the lack of graph-structure inductive biases in Graph Transformers by proposing cluster attention (CLATT), which divides nodes into clusters for attention, resulting in significant performance improvements on a wide range of graph datasets including real-world benchmarks.
Message Passing Neural Networks have recently become the most popular approach to graph machine learning tasks; however, their receptive field is limited by the number of message passing layers. To increase the receptive field, Graph Transformers with global attention have been proposed; however, global attention does not take into account the graph topology and thus lacks graph-structure-based inductive biases, which are typically very important for graph machine learning tasks. In this work, we propose an alternative approach: cluster attention (CLATT). We divide graph nodes into clusters with off-the-shelf graph community detection algorithms and let each node attend to all other nodes in each cluster. CLATT provides large receptive fields while still having strong graph-structure-based inductive biases. We show that augmenting Message Passing Neural Networks or Graph Transformers with CLATT significantly improves their performance on a wide range of graph datasets including datasets from the recently introduced GraphLand benchmark representing real-world applications of graph machine learning.