STLGFAMLJun 6, 2015

Optimal Rates for Random Fourier Features

arXiv:1506.02155v2140 citations
AI Analysis

This work addresses the computational scalability problem in kernel methods for machine learning practitioners, providing foundational theoretical insights that are incremental but rigorous.

The paper tackles the theoretical gap in understanding the approximation quality of Random Fourier Features (RFF) for kernel methods, establishing optimal performance guarantees in uniform and L^r norms and proposing an RFF approximation for kernel derivatives with theoretical analysis.

Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in $L^r$ ($1\le r<\infty$) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes