LGAIFeb 12, 2022

SemiRetro: Semi-template framework boosts deep retrosynthesis prediction

arXiv:2202.08205v121 citations
Originality Incremental advance
AI Analysis

This work addresses the scalability and accuracy trade-off in retrosynthesis prediction for chemistry and drug discovery, representing a hybrid incremental improvement.

The authors tackled retrosynthesis prediction by combining template-based and template-free approaches into a semi-template framework, achieving 98.9% data coverage with only 150 semi-templates and improving top-1 accuracy by 4.8-6.0% over existing methods.

Recently, template-based (TB) and template-free (TF) molecule graph learning methods have shown promising results to retrosynthesis. TB methods are more accurate using pre-encoded reaction templates, and TF methods are more scalable by decomposing retrosynthesis into subproblems, i.e., center identification and synthon completion. To combine both advantages of TB and TF, we suggest breaking a full-template into several semi-templates and embedding them into the two-step TF framework. Since many semi-templates are reduplicative, the template redundancy can be reduced while the essential chemical knowledge is still preserved to facilitate synthon completion. We call our method SemiRetro, introduce a new GNN layer (DRGAT) to enhance center identification, and propose a novel self-correcting module to improve semi-template classification. Experimental results show that SemiRetro significantly outperforms both existing TB and TF methods. In scalability, SemiRetro covers 98.9\% data using 150 semi-templates, while previous template-based GLN requires 11,647 templates to cover 93.3\% data. In top-1 accuracy, SemiRetro exceeds template-free G2G 4.8\% (class known) and 6.0\% (class unknown). Besides, SemiRetro has better training efficiency than existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes