DSLGNov 5, 2017

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

arXiv:1711.01596v114 citations
Originality Incremental advance
AI Analysis

This work addresses the efficiency of kernel methods in machine learning, providing lower bounds that are incremental but clarify the hardness of improving current algorithms.

The paper tackles the computational limits of low-rank kernel approximation, showing that for many kernels like Gaussian and polynomial, achieving relative error approximation requires time proportional to the non-zero entries in the input matrix times the rank, matching existing algorithms and indicating that significant speedups to input sparsity runtimes are unlikely. However, it also demonstrates that input sparsity time approximation is possible for radial basis function kernels in the related problem of kernelized dataset approximation.

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error $k$-rank approximation to $K$ is at least as difficult as multiplying the input data matrix $A \in \mathbb{R}^{n \times d}$ by an arbitrary matrix $C \in \mathbb{R}^{d \times k}$. Barring a breakthrough in fast matrix multiplication, when $k$ is not too large, this requires $Ω(nnz(A)k)$ time where $nnz(A)$ is the number of non-zeros in $A$. This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to $O(nnz(A))$ input sparsity runtimes. At the same time there is hope: we show for the first time that $O(nnz(A))$ time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes