Transformers are efficient hierarchical chemical graph learners
This addresses a computational bottleneck for researchers and practitioners in cheminformatics and graph learning, offering a more efficient alternative to existing graph transformers.
The paper tackles the computational inefficiency of graph transformers by introducing SubFormer, which operates on subgraphs to reduce token count and improve long-range interactions, achieving competitive performance on molecular property prediction benchmarks with significantly lower computational cost (training in minutes on consumer-grade GPU).
Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to computational challenges for even moderately-sized graphs due to the quadratic scaling of self-attention complexity with token count. In this paper, we introduce SubFormer, a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism. This approach reduces the number of tokens and enhances learning long-range interactions. We demonstrate SubFormer on benchmarks for predicting molecular properties from chemical structures and show that it is competitive with state-of-the-art graph transformers at a fraction of the computational cost, with training times on the order of minutes on a consumer-grade graphics card. We interpret the attention weights in terms of chemical structures. We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.