LGAICLIRQMJul 22, 2023

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

arXiv:2307.12996v13 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of leveraging textual scientific knowledge for improved molecular property prediction in computational biochemistry, representing an incremental advance with specific gains.

The paper tackles the problem of transferring molecular property information from natural language to graph representations using multimodal contrastive learning, achieving a +4.26% AUROC gain over graph-only pre-training and a +1.54% gain over a recent baseline on MoleculeNet tasks.

Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes