LGCLBMAug 14, 2023

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

arXiv:2308.06911v3140 citationsh-index: 78
Originality Highly original
AI Analysis

This addresses the need for better multi-modal processing in molecular science, offering incremental improvements for researchers in chemistry and drug discovery.

The paper tackles the problem of existing language models' inability to capture complex molecular structures or images by introducing GIT-Mol, a multi-modal large language model integrating Graph, Image, and Text information, achieving a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to baselines.

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes