BMLGQMApr 3, 2023

Development and Evaluation of Conformal Prediction Methods for QSAR

arXiv:2304.00970v14 citationsh-index: 48
Originality Incremental advance
AI Analysis

This work addresses the need for reliable uncertainty estimates in QSAR modeling, which is crucial for optimizing molecular structures and prioritizing compounds in drug discovery, representing an incremental improvement by applying conformal prediction to specific ML models in this domain.

The paper tackled the problem of estimating uncertainty in QSAR regression models for predicting biological activities of compounds, proposing computationally efficient conformal prediction algorithms tailored to advanced ML models like Deep Neural Networks and Gradient Boosting Machines, and demonstrated their validity and efficiency on diverse QSAR datasets and simulations.

The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds using their molecular descriptors. Predictions from QSAR models can help, for example, to optimize molecular structure; prioritize compounds for further experimental testing; and estimate their toxicity. In addition to the accurate estimation of the activity, it is highly desirable to obtain some estimate of the uncertainty associated with the prediction, e.g., calculate a prediction interval (PI) containing the true molecular activity with a pre-specified probability, say 70%, 90% or 95%. The challenge is that most machine learning (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction. The development of these algorithms is an active area of research by statistical and ML communities but their implementation for QSAR modeling remains limited. Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most advanced ML models, including Deep Neural Networks and Gradient Boosting Machines. The validity and efficiency of proposed conformal predictors are demonstrated on a diverse collection of QSAR datasets as well as simulation studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes