MEQMAPMLAug 5, 2015

Bayesian Approximate Kernel Regression with Variable Selection

arXiv:1508.01217v447 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of variable selection in kernel regression for fields like statistical genetics, offering a method that is competitive in both genomic selection and association mapping, though it is incremental as it builds on existing kernel and linear model approaches.

The paper tackles variable selection in nonlinear kernel regression by proposing a Bayesian approximate kernel regression (BAKR) framework that defines effect size analogs for explanatory variables using random Fourier expansions, enabling competitive performance in genomic selection and association mapping.

Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant --- for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e. phenotypic prediction) and association mapping (i.e. inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes