MLJun 29, 2015

Bayesian Nonparametric Kernel-Learning

arXiv:1506.08776v272 citations
Originality Highly original
AI Analysis

This addresses the challenge of kernel choice and computational inefficiency in machine learning for large-scale regression and classification, representing a novel method rather than an incremental improvement.

The paper tackles the problem of kernel selection and scalability in kernel methods by introducing Bayesian nonparametric kernel-learning (BaNK), a data-driven framework that learns kernels and scales to large datasets, outperforming other scalable approaches on real-world tasks.

Kernel methods are ubiquitous tools in machine learning. However, there is often little reason for the common practice of selecting a kernel a priori. Even if a universal approximating kernel is selected, the quality of the finite sample estimator may be greatly affected by the choice of kernel. Furthermore, when directly applying kernel methods, one typically needs to compute a $N \times N$ Gram matrix of pairwise kernel evaluations to work with a dataset of $N$ instances. The computation of this Gram matrix precludes the direct application of kernel methods on large datasets, and makes kernel learning especially difficult. In this paper we introduce Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels. BaNK places a nonparametric prior on the spectral distribution of random frequencies allowing it to both learn kernels and scale to large datasets. We show that this framework can be used for large scale regression and classification tasks. Furthermore, we show that BaNK outperforms several other scalable approaches for kernel learning on a variety of real world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes