ML LGNov 22, 2019

A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model

arXiv:1911.10225v33.27 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses optimization challenges in multi-output Gaussian process models for researchers in machine learning, offering incremental improvements to existing methods.

The paper tackles the issue of strong conditioning between variational parameters and hyper-parameters in heterogeneous multi-output Gaussian process models, which burdens optimization, and introduces a fully natural gradient scheme that achieves better local optima with higher test performance rates, as demonstrated on toy and real databases.

A recent novel extension of multi-output Gaussian processes handles heterogeneous outputs assuming that each output has its own likelihood function. It uses a vector-valued Gaussian process prior to jointly model all likelihoods' parameters as latent functions drawn from a Gaussian process with a linear model of coregionalisation covariance. By means of an inducing points framework, the model is able to obtain tractable variational bounds amenable to stochastic variational inference. Nonetheless, the strong conditioning between the variational parameters and the hyper-parameters burdens the adaptive gradient optimisation methods used in the original approach. To overcome this issue we borrow ideas from variational optimisation introducing an exploratory distribution over the hyper-parameters, allowing inference together with the posterior's variational parameters through a fully natural gradient optimisation scheme. Furthermore, in this work we introduce an extension of the heterogeneous multi-output model, where its latent functions are drawn from convolution processes. We show that our optimisation scheme can achieve better local optima solutions with higher test performance rates than adaptive gradient methods, this for both the linear model of coregionalisation and the convolution processes model. We also show how to make the convolutional model scalable by means of stochastic variational inference and how to optimise it through a fully natural gradient scheme. We compare the performance of the different methods over toy and real databases.

View on arXiv PDF Code

Similar