Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
This work addresses the scalability and depth limitations of GNNs for large graph representation learning, offering a more efficient alternative to attention-based methods, though it is incremental as it builds on existing message-passing and transformer concepts.
The authors tackled the problem of scaling deep graph neural networks (GNNs) to large graphs by proposing Scalable Message Passing Neural Networks (SMPNNs), which replace attention with convolutional message passing in a transformer-style block, achieving competitive state-of-the-art results in large graph transductive learning and outperforming Graph Transformers without expensive attention mechanisms.
We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive learning, particularly outperforming the best Graph Transformers in the literature, without requiring the otherwise computationally and memory-expensive attention mechanism. Our architecture not only scales to large graphs but also makes it possible to construct deep message-passing networks, unlike simple GNNs, which have traditionally been constrained to shallow architectures due to oversmoothing. Moreover, we provide a new theoretical analysis of oversmoothing based on universal approximation which we use to motivate SMPNNs. We show that in the context of graph convolutions, residual connections are necessary for maintaining the universal approximation properties of downstream learners and that removing them can lead to a loss of universality.