NALGDec 24, 2022

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

arXiv:2212.12674v29 citationsh-index: 22
Originality Incremental advance
AI Analysis

This provides a scalable solution for applications like Gaussian process regression with large datasets, though it appears incremental as it builds on geometric selection ideas.

The paper tackles the problem of efficiently approximating large rectangular kernel matrices for arbitrarily distributed point sets, achieving a method that scales linearly with data size for a fixed rank.

A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = κ(x_i,y_j)$ where $κ(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes