Automated Learning of Interpretable Models with Quantified Uncertainty
This work addresses the need for interpretable models with quantified uncertainty in machine learning, particularly for noisy data, though it is incremental as it builds on existing symbolic regression techniques.
The authors tackled the problem of interpretability and uncertainty quantification in symbolic regression for noisy data by introducing a Bayesian framework for genetic-programming-based symbolic regression, resulting in improved interpretability, robustness to noise, and reduced overfitting compared to conventional methods.
Interpretability and uncertainty quantification in machine learning can provide justification for decisions, promote scientific discovery and lead to a better understanding of model behavior. Symbolic regression provides inherently interpretable machine learning, but relatively little work has focused on the use of symbolic regression on noisy data and the accompanying necessity to quantify uncertainty. A new Bayesian framework for genetic-programming-based symbolic regression (GPSR) is introduced that uses model evidence (i.e., marginal likelihood) to formulate replacement probability during the selection phase of evolution. Model parameter uncertainty is automatically quantified, enabling probabilistic predictions with each equation produced by the GPSR algorithm. Model evidence is also quantified in this process, and its use is shown to increase interpretability, improve robustness to noise, and reduce overfitting when compared to a conventional GPSR implementation on both numerical and physical experiments.