Uncertainty quantification of molecular property prediction with Bayesian neural networks
This work addresses uncertainty quantification for chemists and researchers in molecular property prediction, but it is incremental as it applies an existing Bayesian method to a specific domain.
The paper tackled the problem of uncertainty in molecular property predictions due to insufficient training data by using Bayesian neural networks to quantify and decompose predictive variance into model- and data-driven uncertainties, enabling the identification of errors in datasets like the Harvard Clean Energy Project.
Deep neural networks have outperformed existing machine learning models in various molecular applications. In practical applications, it is still difficult to make confident decisions because of the uncertainty in predictions arisen from insufficient quality and quantity of training data. Here, we show that Bayesian neural networks are useful to quantify the uncertainty of molecular property prediction with three numerical experiments. In particular, it enables us to decompose the predictive variance into the model- and data-driven uncertainties, which helps to elucidate the source of errors. In the logP predictions, we show that data noise affected the data-driven uncertainties more significantly than the model-driven ones. Based on this analysis, we were able to find unexpected errors in the Harvard Clean Energy Project dataset. Lastly, we show that the confidence of prediction is closely related to the predictive uncertainty by performing on bio-activity and toxicity classification problems.