Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior
This work provides a theoretical framework for kernel methods, potentially improving efficiency and interpretability in machine learning applications, though it appears incremental as it builds on existing RFF and PAC-Bayesian concepts.
The paper revisits kernel random Fourier features through PAC-Bayesian theory, treating the Fourier transform as a prior to derive generalization bounds optimized by a closed-form pseudo-posterior, leading to two learning strategies for compact representation and kernel alignment justification.
We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016).