Graph-Aware Transformer: Is Attention All Graphs Need?
This addresses the problem of applying Transformers to graph data for tasks like classification and generation in domains such as chemistry, though it is incremental as it adapts an existing model to a new data type.
The authors tackled the incompatibility of Transformers with non-sequential graph data by proposing GRAT, a Transformer-based encoder-decoder model that adapts self-attention to edges and uses a two-path decoding mechanism, achieving state-of-the-art performance on 4 regression tasks in the QM9 benchmark.
Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer, shows superior performance in various tasks especially of natural language processing, it is not immediately available for graphs due to their non-sequential characteristics. To tackle this incompatibility, we propose GRaph-Aware Transformer (GRAT), the first Transformer-based model which can encode and decode whole graphs in end-to-end fashion. GRAT is featured with a self-attention mechanism adaptive to the edge information and an auto-regressive decoding mechanism based on the two-path approach consisting of sub-graph encoding path and node-and-edge generation path for each decoding step. We empirically evaluated GRAT on multiple setups including encoder-based tasks such as molecule property predictions on QM9 datasets and encoder-decoder-based tasks such as molecule graph generation in the organic molecule synthesis domain. GRAT has shown very promising results including state-of-the-art performance on 4 regression tasks in QM9 benchmark.