ML LGMay 7, 2020

Relevance Vector Machine with Weakly Informative Hyperprior and Extended Predictive Information Criterion

arXiv:2005.03419v11.4

Originality Incremental advance

AI Analysis

This work addresses model selection and overfitting issues in kernel-based regression for non-homogeneous data, representing an incremental improvement in Bayesian machine learning methods.

The authors tackled the problem of overfitting in multiple kernel relevance vector regression by proposing a weakly informative inverse gamma hyperprior and an extended predictive information criterion, achieving improved predictive accuracy on non-homogeneous data.

In the variational relevance vector machine, the gamma distribution is representative as a hyperprior over the noise precision of automatic relevance determination prior. Instead of the gamma hyperprior, we propose to use the inverse gamma hyperprior with a shape parameter close to zero and a scale parameter not necessary close to zero. This hyperprior is associated with the concept of a weakly informative prior. The effect of this hyperprior is investigated through regression to non-homogeneous data. Because it is difficult to capture the structure of such data with a single kernel function, we apply the multiple kernel method, in which multiple kernel functions with different widths are arranged for input data. We confirm that the degrees of freedom in a model is controlled by adjusting the scale parameter and keeping the shape parameter close to zero. A candidate for selecting the scale parameter is the predictive information criterion. However the estimated model using this criterion seems to cause over-fitting. This is because the multiple kernel method makes the model a situation where the dimension of the model is larger than the data size. To select an appropriate scale parameter even in such a situation, we also propose an extended prediction information criterion. It is confirmed that a multiple kernel relevance vector regression model with good predictive accuracy can be obtained by selecting the scale parameter minimizing extended prediction information criterion.

View on arXiv PDF

Similar