Scalable Gaussian Process Computations Using Hierarchical Matrices
This work addresses the computational bottleneck of Gaussian process inference for large datasets, enabling scalable MLE with theoretical guarantees.
The authors present a kernel-independent method using hierarchical matrices to enable quasilinear O(n log^2 n) complexity for maximum likelihood estimation in Gaussian processes, including gradient, Hessian, and Fisher information matrix computation. They demonstrate that the resulting MLEs and confidence intervals faithfully approach exact likelihood results.
We present a kernel-independent method that applies hierarchical matrices to the problem of maximum likelihood estimation for Gaussian processes. The proposed approximation provides natural and scalable stochastic estimators for its gradient and Hessian, as well as the expected Fisher information matrix, that are computable in quasilinear $O(n \log^2 n)$ complexity for a large range of models. To accomplish this, we (i) choose a specific hierarchical approximation for covariance matrices that enables the computation of their exact derivatives and (ii) use a stabilized form of the Hutchinson stochastic trace estimator. Since both the observed and expected information matrices can be computed in quasilinear complexity, covariance matrices for MLEs can also be estimated efficiently. After discussing the associated mathematics, we demonstrate the scalability of the method, discuss details of its implementation, and validate that the resulting MLEs and confidence intervals based on the inverse Fisher information matrix faithfully approach those obtained by the exact likelihood.