LGCLMay 22, 2025

A Survey of Large Language Models for Text-Guided Molecular Discovery: from Molecule Generation to Optimization

arXiv:2505.16094v18 citationsh-index: 4Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the problem of integrating LLMs into molecular science for researchers, but it is incremental as a survey rather than original research.

This survey reviews the emerging use of large language models (LLMs) for text-guided molecular discovery, focusing on molecule generation and optimization, and provides a taxonomy, analysis of techniques, datasets, and future directions.

Large language models (LLMs) are introducing a paradigm shift in molecular discovery by enabling text-guided interaction with chemical spaces through natural language, symbolic notations, with emerging extensions to incorporate multi-modal inputs. To advance the new field of LLM for molecular discovery, this survey provides an up-to-date and forward-looking review of the emerging use of LLMs for two central tasks: molecule generation and molecule optimization. Based on our proposed taxonomy for both problems, we analyze representative techniques in each category, highlighting how LLM capabilities are leveraged across different learning settings. In addition, we include the commonly used datasets and evaluation protocols. We conclude by discussing key challenges and future directions, positioning this survey as a resource for researchers working at the intersection of LLMs and molecular science. A continuously updated reading list is available at https://github.com/REAL-Lab-NU/Awesome-LLM-Centric-Molecular-Discovery.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes