BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation
This work addresses the problem of improving molecular representation learning for AI-driven drug discovery, offering a simpler and more efficient self-supervised method compared to existing complex approaches, though it is incremental in nature.
The paper tackles the challenge of effective molecular representation learning in AI-driven drug discovery, especially with insufficient labeled data, by proposing BatmanNet, a bi-branch masked graph transformer autoencoder that learns local and global information through self-supervised reconstruction; it achieves state-of-the-art results on 13 benchmark datasets for tasks like molecular property prediction, drug-drug interaction, and drug-target interaction.
Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets, which are time-consuming, computationally expensive, and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.