Pengfei Liu

11.5LGFeb 11, 2024

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Taojie Kuang, Pengfei Liu, Zhixiang Ren

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learning-based methods has shown remarkable potential in enhancing molecular property prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves molecular property prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in root mean square error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. We also discover that enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%, and augmenting 2D graphs with 3D information increases performance for classification tasks by up to 13.2%, with both enhancements measured using ROC-AUC. The two consolidated insights offer crucial guidance for future advancements in drug discovery.

5.9QMDec 28, 2018Code

Drug cell line interaction prediction

Pengfei Liu

Understanding the phenotypic drug response on cancer cell lines plays a vital rule in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to test their models and methods. Previously, most research in these areas starts from the fingerprints or features of drugs, instead of their structures. In this paper, we introduce a model for phenotypic screening, which is called twin Convolutional Neural Network for drugs in SMILES format (tCNNS). tCNNS is comprised of CNN input channels for drugs in SMILES format and cancer cell lines respectively. Our model achieves $0.84$ for the coefficient of determinant($R^2$) and $0.92$ for Pearson correlation($R_p$), which are significantly better than previous works\cite{ammad2014integrative,haider2015copula,menden2013machine}. Besides these statistical metrics, tCNNS also provides some insights into phenotypic screening.

Pengfei Liu

2 Papers