LGAICEJun 26, 2025

MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework

arXiv:2507.02932v11 citationsh-index: 7Has CodeJ Adv Res
Originality Incremental advance
AI Analysis

This work addresses the gap in capturing expert reasoning in drug discovery, offering a practical tool for collaborative drug design, though it is incremental as it builds on existing pre-trained models with a novel integration method.

The paper tackles the problem of molecular property prediction by integrating chemists' domain knowledge into pre-trained models, achieving a 15.0% reduction in RMSE on FreeSolv and a 5.39% improvement in AUROC on BACE compared to state-of-the-art models.

MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a 15.0 percent reduction in RMSE on FreeSolv and a 5.39 percent improvement in AUROC on BACE. Analysis reveals that chemist knowledge and structural features provide complementary contributions, improving both accuracy and interpretability. MolProphecy offers a practical and generalizable approach for collaborative drug discovery, with the flexibility to incorporate real chemist input in place of the current simulated proxy--without the need for model retraining. The implementation is publicly available at https://github.com/zhangruochi/MolProphecy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes