A Compositional Kernel Model for Feature Learning
This work provides a simple testbed for understanding feature learning in compositional architectures, which is significant for researchers developing more effective machine learning models.
This paper introduces a compositional kernel ridge regression model that reweights inputs coordinate-wise. It demonstrates that this model can recover relevant features and eliminate noise variables, particularly showing that L1-type kernels recover nonlinear effects while Gaussian kernels recover only linear ones.
We study a compositional variant of kernel ridge regression in which the predictor is applied to a coordinate-wise reweighting of the inputs. Formulated as a variational problem, this model provides a simple testbed for feature learning in compositional architectures. From the perspective of variable selection, we show how relevant variables are recovered while noise variables are eliminated. We establish guarantees showing that both global minimizers and stationary points discard noise coordinates when the noise variables are Gaussian distributed. A central finding is that $\ell_1$-type kernels, such as the Laplace kernel, succeed in recovering features contributing to nonlinear effects at stationary points, whereas Gaussian kernels recover only linear ones.