STLGMLMar 17, 2022

Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed

arXiv:2203.09179v338 citationsh-index: 27
Originality Incremental advance
AI Analysis

This identifies a foundational theoretical flaw affecting users of Gaussian process regression, though it is incremental as it formalizes known folklore.

The paper demonstrates that maximum likelihood estimation in Gaussian process regression is ill-posed in noiseless settings with stationary covariance functions, leading to predictive distributions that are not Lipschitz in the data under Hellinger distance.

Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes