LGAICECLJan 26, 2023

Domain-Agnostic Molecular Generation with Chemical Feedback

arXiv:2301.11259v633 citationsh-index: 32Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of generating syntactically and chemically valid molecules for scientists in chemical and drug design, representing an incremental improvement over existing methods.

The paper tackles the challenge of generating diverse and feasible molecules with desired properties by introducing MolGen, a pre-trained molecular language model that uses domain-agnostic prefix tuning and chemical feedback to avoid hallucinations, achieving optimization in benchmarks like penalized logP, QED, and molecular docking.

The generation of molecules with desired properties has become increasingly popular, revolutionizing the way scientists design molecular structures and providing valuable support for chemical and drug design. However, despite the potential of language models in molecule generation, they face challenges such as generating syntactically or chemically flawed molecules, having narrow domain focus, and struggling to create diverse and feasible molecules due to limited annotated data or external molecular databases. To tackle these challenges, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. Through the reconstruction of over 100 million molecular SELFIES, MolGen internalizes structural and grammatical insights. This is further enhanced by domain-agnostic molecular prefix tuning, fostering robust knowledge transfer across diverse domains. Importantly, our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences. Extensive experiments on well-known benchmarks underscore MolGen's optimization capabilities in properties such as penalized logP, QED, and molecular docking. Additional analyses confirm its proficiency in accurately capturing molecule distributions, discerning intricate structural patterns, and efficiently exploring the chemical space. Code is available at https://github.com/zjunlp/MolGen.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes