QMAILGFeb 6, 2024

MolTC: Towards Molecular Relational Modeling In Language Models

arXiv:2402.03781v645 citationsh-index: 13Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses the challenge of underutilizing structural information in molecular interactions for biochemical research, offering a more efficient and effective approach, though it is incremental as it builds on existing LLM and CoT methods.

The paper tackles the problem of Molecular Relational Learning (MRL) by proposing MolTC, a novel LLM-based multi-modal framework that integrates graphical information from molecular pairs, achieving superior performance over existing GNN and LLM-based baselines across datasets with over 4,000,000 molecular pairs.

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. To train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes