LGNov 25, 2021

Fragment-based molecular generative model with high generalization ability and synthetic accessibility

arXiv:2111.12907v13 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing synthesizable molecules with specific properties for drug discovery, representing an incremental improvement over atom-based methods.

The authors tackled the problem of generating molecules with desired properties by proposing a fragment-based generative model that sequentially adds molecular fragments, achieving high success rates in controlling multiple target properties and demonstrating practical application by generating potential SARS-COV-2 inhibitors with high binding affinities.

Deep generative models are attracting great attention for molecular design with desired properties. Most existing models generate molecules by sequentially adding atoms. This often renders generated molecules with less correlation with target properties and low synthetic accessibility. Molecular fragments such as functional groups are more closely related to molecular properties and synthetic accessibility than atoms. Here, we propose a fragment-based molecular generative model which designs new molecules with target properties by sequentially adding molecular fragments to any given starting molecule. A key feature of our model is a high generalization ability in terms of property control and fragment types. The former becomes possible by learning the contribution of individual fragments to the target properties in an auto-regressive manner. For the latter, we used a deep neural network that predicts the bonding probability of two molecules from the embedding vectors of the two molecules as input. The high synthetic accessibility of the generated molecules is implicitly considered while preparing the fragment library with the BRICS decomposition method. We show that the model can generate molecules with the simultaneous control of multiple target properties at a high success rate. It also works equally well with unseen fragments even in the property range where the training data is rare, verifying the high generalization ability. As a practical application, we demonstrated that the model can generate potential inhibitors with high binding affinities against the 3CL protease of SARS-COV-2 in terms of docking score.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes