LGSep 6, 2024

Enhancing Uncertainty Quantification in Drug Discovery with Censored Regression Labels

Emma Svensson, Hannah Rosa Friesacher, Susanne Winiwarter, Lewis Mervin, Adam Arany, Ola Engkvist

arXiv:2409.04313v110.47 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses the need for better uncertainty quantification in drug discovery to optimize resource use and improve trust in models, though it is incremental as it adapts existing methods to handle censored labels.

The paper tackled the problem of inaccurate uncertainty quantification in drug discovery models due to limited data and sparse observations, by adapting ensemble-based, Bayesian, and Gaussian models to learn from censored labels using the Tobit model, resulting in improved accuracy and reliability in modeling real pharmaceutical settings.

In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models. These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for drug discovery often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to accurately and reliably model the real pharmaceutical setting.

View on arXiv PDF

Similar