CLCHEM-PHBMJun 21, 2023

Interactive Molecular Discovery with Natural Language

arXiv:2306.11976v13 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the problem of high technical barriers in biochemistry tasks like property prediction and molecule mining for researchers and practitioners, though it is incremental in applying natural language interaction to a specific domain.

The authors tackled the challenge of using natural language for molecular discovery by proposing conversational molecular design, a task for describing and editing molecules with natural language, and developed ChatMol, a generative pre-trained model enhanced with experimental property information and molecular spatial knowledge, which outperformed solutions like ChatGPT in evaluations.

Natural language is expected to be a key medium for various human-machine interactions in the era of large language models. When it comes to the biochemistry field, a series of tasks around molecules (e.g., property prediction, molecule mining, etc.) are of great significance while having a high technical threshold. Bridging the molecule expressions in natural language and chemical language can not only hugely improve the interpretability and reduce the operation difficulty of these tasks, but also fuse the chemical knowledge scattered in complementary materials for a deeper comprehension of molecules. Based on these benefits, we propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules. To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages into it. Several typical solutions including large language models (e.g., ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement method. Case observations and analysis are conducted to provide directions for further exploration of natural-language interaction in molecular discovery.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes