QMLGAug 23, 2022

Retrieval-based Controllable Molecule Generation

arXiv:2208.11126v350 citationsh-index: 108Has Code
Originality Highly original
AI Analysis

This addresses the challenge of data scarcity in drug discovery by enabling controllable molecule generation with minimal training, offering a practical solution for real-world applications.

The paper tackles the problem of generating molecules with specified properties without requiring large datasets or task-specific fine-tuning, by introducing a retrieval-based framework that uses a small set of exemplar molecules to steer pre-trained generative models, achieving better performance and wider applicability than previous methods in tasks including designing lead compounds for SARS-CoV-2.

Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. Code is available at https://github.com/NVlabs/RetMol.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes