Stronger Graph Transformer with Regularized Attention Scores
This work addresses memory efficiency for researchers and practitioners using Graph Neural Networks, but it is incremental as it builds on existing Graph Transformer methods.
The paper tackles the memory consumption issue in Graph Transformers by proposing an edge regularization technique that eliminates the need for positional encoding, resulting in stable performance improvements compared to Graph Transformers without positional encoding.
Graph Neural Networks are notorious for its memory consumption. A recent Transformer-based GNN called Graph Transformer is shown to obtain superior performances when long range dependencies exist. However, combining graph data and Transformer architecture led to a combinationally worse memory issue. We propose a novel version of "edge regularization technique" that alleviates the need for Positional Encoding and ultimately alleviate GT's out of memory issue. We observe that it is not clear whether having an edge regularization on top of positional encoding is helpful. However, it seems evident that applying our edge regularization technique indeed stably improves GT's performance compared to GT without Positional Encoding.