MLLGMay 26, 2023

Sources of Uncertainty in Supervised Machine Learning -- A Statisticians' View

arXiv:2305.16703v338 citations
Originality Synthesis-oriented
AI Analysis

It addresses the need for precise uncertainty quantification in machine learning, which is crucial for improving model reliability and interpretability in applications, though it is conceptual and incremental.

The paper examines the sources of uncertainty in supervised machine learning, adopting a statistician's view to formalize aleatoric and epistemic uncertainty, and demonstrates that these sources are diverse and not always decomposable.

Supervised machine learning and predictive models have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction, which is the primary strength of most supervised machine learning algorithms, the quantification of uncertainty is relevant and necessary as well. However, before quantification is possible, types and sources of uncertainty need to be defined precisely. While first concepts and ideas in this direction have emerged in recent years, this paper adopts a conceptual, basic science perspective and examines possible sources of uncertainty. By adopting the viewpoint of a statistician, we discuss the concepts of aleatoric and epistemic uncertainty, which are more commonly associated with machine learning. The paper aims to formalize the two types of uncertainty and demonstrates that sources of uncertainty are miscellaneous and can not always be decomposed into aleatoric and epistemic. Drawing parallels between statistical concepts and uncertainty in machine learning, we emphasise the role of data and their influence on uncertainty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes