LGMay 22, 2025

ChemMLLM: Chemical Multimodal Large Language Model

arXiv:2505.16326v213 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This addresses a gap in chemical AI for researchers and practitioners by enabling multimodal molecule tasks, though it appears incremental as it adapts existing MLLM paradigms to a specific domain.

The paper tackles the lack of chemical multimodal large language models (MLLMs) for cross-modal understanding and generation by proposing ChemMLLM, which achieves superior performance across five multimodal tasks, such as outperforming GPT-4o by 116.75% in molecule image optimization.

Multimodal large language models (MLLMs) have made impressive progress in many applications in recent years. However, chemical MLLMs that can handle cross-modal understanding and generation remain underexplored. To fill this gap, we propose ChemMLLM, a unified chemical multimodal large language model for molecule understanding and generation. Also, we design five multimodal tasks across text, molecular SMILES strings, and image, and curate the datasets. We benchmark ChemMLLM against a range of general leading MLLMs and Chemical LLMs on these tasks. Experimental results show that ChemMLLM achieves superior performance across all evaluated tasks. For example, in molecule image optimization task, ChemMLLM outperforms the best baseline (GPT-4o) by 116.75\% (4.27 vs 1.97 property improvement). The code is publicly available at https://github.com/bbsbz/ChemMLLM.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes