Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery
It addresses the problem of accelerating drug discovery by leveraging NLP techniques, but it is incremental as it reviews existing advances rather than presenting new research.
This review explores how natural language processing (NLP) methodologies are applied to text-based representations of chemicals and proteins to predict molecular properties and design novel molecules, aiming to bridge the gap between medicinal chemists and computer scientists in drug discovery.
Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to construct models to predict molecular properties or to design novel molecules. This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.