DSLGCAJun 12, 2020

Fourier Sparse Leverage Scores and Approximate Kernel Learning

arXiv:2006.07340v326 citations
AI Analysis

This work addresses kernel approximation and active learning challenges in machine learning, offering practical improvements for low-dimensional data and non-uniform distributions, though it is incremental by extending prior results from uniform measures.

The paper tackles the problem of bounding leverage scores for Fourier sparse functions under Gaussian and Laplace measures, with results enabling a new random Fourier features algorithm for kernel approximation that uses a near-optimal number of features for low-dimensional data and provides optimal active learning distributions for non-uniform data.

We prove new explicit upper bounds on the leverage scores of Fourier sparse functions under both the Gaussian and Laplace measures. In particular, we study $s$-sparse functions of the form $f(x) = \sum_{j=1}^s a_j e^{i λ_j x}$ for coefficients $a_j \in \mathbb{C}$ and frequencies $λ_j \in \mathbb{R}$. Bounding Fourier sparse leverage scores under various measures is of pure mathematical interest in approximation theory, and our work extends existing results for the uniform measure [Erd17,CP19a]. Practically, our bounds are motivated by two important applications in machine learning: 1. Kernel Approximation. They yield a new random Fourier features algorithm for approximating Gaussian and Cauchy (rational quadratic) kernel matrices. For low-dimensional data, our method uses a near optimal number of features, and its runtime is polynomial in the $statistical\ dimension$ of the approximated kernel matrix. It is the first "oblivious sketching method" with this property for any kernel besides the polynomial kernel, resolving an open question of [AKM+17,AKK+20b]. 2. Active Learning. They can be used as non-uniform sampling distributions for robust active learning when data follows a Gaussian or Laplace distribution. Using the framework of [AKM+19], we provide essentially optimal results for bandlimited and multiband interpolation, and Gaussian process regression. These results generalize existing work that only applies to uniformly distributed data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes