Learning with Hierarchical Gaussian Kernels
This work addresses the need for theoretically grounded and interpretable alternatives to deep learning for researchers and practitioners in machine learning, though it appears incremental as it builds on existing kernel methods.
The paper tackles the problem of designing interpretable kernel functions that mimic deep neural network architectures by investigating iterated compositions of Gaussian kernels, showing that these kernels are universal and that SVMs using them are universally consistent, with empirical comparisons to methods like SVMs, random forests, and deep neural networks.
We investigate iterated compositions of weighted sums of Gaussian kernels and provide an interpretation of the construction that shows some similarities with the architectures of deep neural networks. On the theoretical side, we show that these kernels are universal and that SVMs using these kernels are universally consistent. We further describe a parameter optimization method for the kernel parameters and empirically compare this method to SVMs, random forests, a multiple kernel learning approach, and to some deep neural networks.