MARS: A Motif-based Autoregressive Model for Retrosynthesis Prediction
This work addresses a key bottleneck in drug discovery by improving retrosynthesis prediction, though it is incremental as it builds on existing graph-generation approaches.
The paper tackles the problem of retrosynthesis prediction for drug discovery by proposing a motif-based autoregressive model that sequentially identifies reaction centers, generates synthons, and adds motifs to produce reactants, achieving significant outperformance over previous state-of-the-art algorithms on a benchmark dataset.
Retrosynthesis is a major task for drug discovery. It is formulated as a graph-generating problem by many existing approaches. Specifically, these methods firstly identify the reaction center, and break target molecule accordingly to generate synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or directly adding proper leaving groups. However, both two strategies suffer since adding atoms results in a long prediction sequence which increases generation difficulty, while adding leaving groups can only consider the ones in the training set which results in poor generalization. In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Since chemically meaningful motifs are bigger than atoms and smaller than leaving groups, our method enjoys lower prediction complexity than adding atoms and better generalization than adding leaving groups. Experiments on a benchmark dataset show that the proposed model significantly outperforms previous state-of-the-art algorithms.