LGDSOct 1, 2022

On The Relative Error of Random Fourier Features for Preserving Kernel Distance

arXiv:2210.00244v22 citationsh-index: 19
Originality Highly original
AI Analysis

This addresses a theoretical gap in kernel methods for machine learning practitioners, providing both negative and positive results on relative error preservation, with incremental extensions to data-oblivious dimension reduction.

The paper tackles the problem of approximating kernel distances with relative error using random Fourier features (RFF), showing that for kernels like Laplacian, low-dimensional RFF cannot achieve small relative error, but for analytic shift-invariant kernels, it achieves ε-relative error with poly(ε^{-1} log n) dimensions, and validates this with superior performance on simulated datasets compared to other methods.

The method of random Fourier features (RFF), proposed in a seminal paper by Rahimi and Recht (NIPS'07), is a powerful technique to find approximate low-dimensional representations of points in (high-dimensional) kernel space, for shift-invariant kernels. While RFF has been analyzed under various notions of error guarantee, the ability to preserve the kernel distance with \emph{relative} error is less understood. We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions. We complement this by showing as long as the shift-invariant kernel is analytic, RFF with $\mathrm{poly}(ε^{-1} \log n)$ dimensions achieves $ε$-relative error for pairwise kernel distance of $n$ points, and the dimension bound is improved to $\mathrm{poly}(ε^{-1}\log k)$ for the specific application of kernel $k$-means. Finally, going beyond RFF, we make the first step towards data-oblivious dimension-reduction for general shift-invariant kernels, and we obtain a similar $\mathrm{poly}(ε^{-1} \log n)$ dimension bound for Laplacian kernels. We also validate the dimension-error tradeoff of our methods on simulated datasets, and they demonstrate superior performance compared with other popular methods including random-projection and Nyström methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes