BMCLLGJul 2, 2024

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

arXiv:2407.13780v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of generating viable small molecules for drug discovery, offering incremental improvements in validity and docking performance for specific protein targets.

The authors tackled the challenge of generating syntactically valid and chemically plausible small molecules by proposing a generative latent-variable transformer model with a modified SAFE representation, achieving >90% validity and <1% fragmentation rates. They fine-tuned the model using reinforcement learning for protein targets, nearly doubling hit candidates for some targets and matching or slightly outperforming state-of-the-art docking scores on three out of five targets.

A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes