LG CG GNApr 4, 2024

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers

MIT

arXiv:2404.03380v114.214 citationsh-index: 6Has CodeAISTATS

Originality Incremental advance

AI Analysis

This work addresses the need for more efficient and expressive graph transformers in machine learning, offering incremental improvements through sparsification strategies.

The paper tackles the limited theoretical understanding and empirical results of higher-order graph transformers by analyzing their expressive power relative to the k-WL test, showing that a sparse variant achieves equivalent expressiveness while being computationally efficient.

Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.

View on arXiv PDF Code

Similar