Comparative Evaluation of Applicability Domain Definition Methods for Regression Models
This work addresses the problem of ensuring reliable predictions in regression models for domains like chemistry or materials science, though it is incremental as it builds on existing techniques.
The study tackled the challenge of defining the applicability domain for regression models by benchmarking eight techniques and proposing a novel non-deterministic Bayesian neural network approach, which achieved superior accuracy compared to previous methods.
The applicability domain refers to the range of data for which the prediction of the predictive model is expected to be reliable and accurate and using a model outside its applicability domain can lead to incorrect results. The ability to define the regions in data space where a predictive model can be safely used is a necessary condition for having safer and more reliable predictions to assure the reliability of new predictions. However, defining the applicability domain of a model is a challenging problem, as there is no clear and universal definition or metric for it. This work aims to make the applicability domain more quantifiable and pragmatic. Eight applicability domain detection techniques were applied to seven regression models, trained on five different datasets, and their performance was benchmarked using a validation framework. We also propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model. Our method exhibited superior accuracy in defining the Applicability Domain compared to previous methods, highlighting its potential in this regard.