Local-Global Multimodal Contrastive Learning for Molecular Property Prediction
This work addresses accurate property prediction for molecules, which is incremental as it builds on existing multimodal and contrastive learning approaches in chemistry.
The paper tackled molecular property prediction by integrating molecular structure and chemical semantics through a local-global multimodal contrastive learning framework, achieving consistent and competitive performance on MoleculeNet benchmarks.
Accurate molecular property prediction requires integrating complementary information from molecular structure and chemical semantics. In this work, we propose LGM-CL, a local-global multimodal contrastive learning framework that jointly models molecular graphs and textual representations derived from SMILES and chemistry-aware augmented texts. Local functional group information and global molecular topology are captured using AttentiveFP and Graph Transformer encoders, respectively, and aligned through self-supervised contrastive learning. In addition, chemically enriched textual descriptions are contrasted with original SMILES to incorporate physicochemical semantics in a task-agnostic manner. During fine-tuning, molecular fingerprints are further integrated via Dual Cross-attention multimodal fusion. Extensive experiments on MoleculeNet benchmarks demonstrate that LGM-CL achieves consistent and competitive performance across both classification and regression tasks, validating the effectiveness of unified local-global and multimodal representation learning.