CVJul 4, 2025

MolVision: Molecular Property Prediction with Vision Language Models

arXiv:2507.03283v13 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses molecular property prediction for drug discovery and materials science, representing an incremental advancement by combining visual and textual data in VLMs.

The paper tackled molecular property prediction by introducing MolVision, a method that uses Vision-Language Models (VLMs) to integrate molecular structure images and textual descriptions, finding that multimodal fusion significantly enhances generalization across properties, with improvements shown through fine-tuning strategies like LoRA.

Molecular property prediction is a fundamental task in computational chemistry with critical applications in drug discovery and materials science. While recent works have explored Large Language Models (LLMs) for this task, they primarily rely on textual molecular representations such as SMILES/SELFIES, which can be ambiguous and structurally less informative. In this work, we introduce MolVision, a novel approach that leverages Vision-Language Models (VLMs) by integrating both molecular structure as images and textual descriptions to enhance property prediction. We construct a benchmark spanning ten diverse datasets, covering classification, regression and description tasks. Evaluating nine different VLMs in zero-shot, few-shot, and fine-tuned settings, we find that visual information improves prediction performance, particularly when combined with efficient fine-tuning strategies such as LoRA. Our results reveal that while visual information alone is insufficient, multimodal fusion significantly enhances generalization across molecular properties. Adaptation of vision encoder for molecular images in conjunction with LoRA further improves the performance. The code and data is available at : $\href{https://molvision.github.io/MolVision/}{https://molvision.github.io/MolVision/}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes