Rethinking Graph Transformer Architecture Design for Node Classification
This work addresses scalability and noise issues in graph-based node classification, offering a practical improvement for researchers and practitioners in graph machine learning, though it is incremental as it builds on existing Graph Transformer frameworks.
The authors tackled the limitations of Graph Transformers in node classification, such as susceptibility to global noise and poor scalability, by proposing GNNFormer, a decoupled propagation-transformation architecture that effectively adapts to both homophilous and heterophilous scenarios across 12 benchmark datasets.
Graph Transformer (GT), as a special type of Graph Neural Networks (GNNs), utilizes multi-head attention to facilitate high-order message passing. However, this also imposes several limitations in node classification applications: 1) nodes are susceptible to global noise; 2) self-attention computation cannot scale well to large graphs. In this work, we conduct extensive observational experiments to explore the adaptability of the GT architecture in node classification tasks and draw several conclusions: the current multi-head self-attention module in GT can be completely replaceable, while the feed-forward neural network module proves to be valuable. Based on this, we decouple the propagation (P) and transformation (T) of GNNs and explore a powerful GT architecture, named GNNFormer, which is based on the P/T combination message passing and adapted for node classification in both homophilous and heterophilous scenarios. Extensive experiments on 12 benchmark datasets demonstrate that our proposed GT architecture can effectively adapt to node classification tasks without being affected by global noise and computational efficiency limitations.