LGDSJul 31, 2025

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

arXiv:2507.23539v13 citationsh-index: 14ICLR
Originality Incremental advance
AI Analysis

This addresses computational bottlenecks in large-scale machine learning, such as attention mechanisms in transformers, but is incremental as it builds on prior kernel methods with a new sparsity condition.

The paper tackles the problem of fast matrix-vector multiplication for asymmetric Gaussian kernel matrices, motivated by attention processing in LLMs, and presents the first subquadratic-time algorithm under a sparsity assumption, achieving error bounds with experimental validation.

Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $K\in \mathbb{R}^{n\times n}$. $K$'s columns are indexed by a set of $n$ keys $k_1,k_2\ldots, k_n\in \mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,\ldots,q_n\in \mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-\|q_i-k_j\|_2^2/2σ^2}$ for some bandwidth parameter $σ>0$. Given a vector $x\in \mathbb{R}^n$ and error parameter $ε>0$, our task is to output a $y\in \mathbb{R}^n$ such that $\|Kx-y\|_2\leq ε\|x\|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes