QM LGApr 19, 2022

G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural Network and Self-Training

Zaiyun Lin, Shiqiu Yin, Lei Shi, Wenbiao Zhou, YingSheng Zhang

arXiv:2204.08608v16.69 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses a fundamental challenge in organic chemistry for chemists and researchers, with incremental improvements in model performance.

The paper tackled retrosynthesis prediction by proposing G2GT, a graph-to-graph transformer model with self-training and weak ensemble methods, achieving state-of-the-art top1 accuracies of 54% on USPTO-50K and 50% on USPTO-full datasets.

Retrosynthesis prediction is one of the fundamental challenges in organic chemistry and related fields. The goal is to find reactants molecules that can synthesize product molecules. To solve this task, we propose a new graph-to-graph transformation model, G2GT, in which the graph encoder and graph decoder are built upon the standard transformer structure. We also show that self-training, a powerful data augmentation method that utilizes unlabeled molecule data, can significantly improve the model's performance. Inspired by the reaction type label and ensemble learning, we proposed a novel weak ensemble method to enhance diversity. We combined beam search, nucleus, and top-k sampling methods to further improve inference diversity and proposed a simple ranking algorithm to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K dataset, with top1 accuracy of 54%, and the larger data set USPTO-full, with top1 accuracy of 50%, and competitive top-10 results.

View on arXiv PDF

Similar