LGAIJun 3, 2022

Uncertainty Estimation in Machine Learning

arXiv:2206.01749v14 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This is an incremental survey paper discussing uncertainty estimation for decision-making in ML, relevant to practitioners using complex models.

This paper examines the challenge of uncertainty estimation in machine learning models, particularly for complex nonlinear models and large pre-trained models like GPT-3, and suggests that non-parametric techniques could address this issue with the help of modern supercomputers.

Most machine learning techniques are based upon statistical learning theory, often simplified for the sake of computing speed. This paper is focused on the uncertainty aspect of mathematical modeling in machine learning. Regression analysis is chosen to further investigate the evaluation aspect of uncertainty in model coefficients and, more importantly, in the output feature value predictions. A survey demonstrates major stages in the conventional least squares approach to the creation of the regression model, along with its uncertainty estimation. On the other hand, it is shown that in machine learning the model complexity and severe nonlinearity become serious obstacles to uncertainty evaluation. Furthermore, the process of machine model training demands high computing power, not available at the level of personal computers. This is why so-called pre-trained models are widely used in such areas of machine learning as natural language processing. The latest example of a pre-trained model is the Generative Pre-trained Transformer 3 with hundreds of billions of parameters and a half-terabyte training dataset. Similarly, mathematical models built from real data are growing in complexity which is accompanied by the growing amount of training data. However, when machine models and their predictions are used in decision-making, one needs to estimate uncertainty and evaluate accompanying risks. This problem could be resolved with non-parametric techniques at the expense of greater demand for computing power, which can be offered by modern supercomputers available, including those utilizing graphical and tensor processing units along with the conventional central processors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes