NALGMSCOFeb 2, 2022

Giga-scale Kernel Matrix Vector Multiplication on GPU

arXiv:2202.01085v4
Originality Highly original
AI Analysis

This addresses scalability issues in machine learning and scientific computing for applications like Gaussian Process regression, offering significant speedups over existing methods.

The paper tackled the computational constraints of kernel matrix-vector multiplication (KMVM) for large datasets by proposing the Faster-Fast and Free Memory Method (F3M), which achieves linear time and memory complexity with a relative error of 10^-3 and computes KMVM for a billion points in under a minute on a GPU.

Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scaling issues of KMVM for tall~($10^8\sim 10^9$) and skinny~($D\leq7$) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 1.5-5.5 times} at the cost of $<1\%$ drop in accuracy. We further demonstrate competitive results on \emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes