LG APOct 31, 2022

Evaluating Point-Prediction Uncertainties in Neural Networks for Drug Discovery

Ya Ju Fan, Jonathan E. Allen, Kevin S. McLoughlin, Da Shi, Brian J. Bennion, Xiaohua Zhang, Felice C. Lightstone

arXiv:2210.17043v11.8h-index: 42

Originality Synthesis-oriented

AI Analysis

This work addresses uncertainty quantification for neural networks in drug discovery, which is incremental as it applies existing methods to a specific domain.

The paper tackled the problem of quantifying different sources of predictive uncertainty in neural networks for drug discovery, demonstrating how selected methods estimate uncertainties under various data partitions and featurization schemes and their relationship to prediction error.

Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Methods that combine Bayesian models with NN models address this issue, but are difficult to implement and more expensive to train. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at drug discovery. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.

View on arXiv PDF

Similar