Language models in molecular discovery
This review serves as a resource for researchers, chemists, and AI enthusiasts interested in using language models to accelerate chemical discovery, but it is incremental as it synthesizes existing work without introducing new methods or data.
The paper reviews the role of language models in molecular discovery, highlighting their application in de novo drug design, property prediction, and reaction chemistry to accelerate the molecule discovery cycle, as evidenced by promising recent findings in early-stage drug discovery.
The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.