LGNAMLMar 1, 2024

Sketching the Heat Kernel: Using Gaussian Processes to Embed Data

arXiv:2403.07929v1h-index: 3
Originality Incremental advance
AI Analysis

This provides a novel embedding technique for data analysis, though it builds on prior theoretical work and appears incremental in its application.

The paper tackles the problem of embedding data in low-dimensional Euclidean space by introducing a non-deterministic method based on Gaussian processes with a heat kernel covariance, which approximates diffusion distances probabilistically and shows robustness to outliers.

This paper introduces a novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data. This type of embedding first appeared in (Adler et al, 2018) as a theoretical model for a generic manifold in high dimensions. In particular, we take the covariance function of the Gaussian process to be the heat kernel, and computing the embedding amounts to sketching a matrix representing the heat kernel. The Karhunen-Loève expansion reveals that the straight-line distances in the embedding approximate the diffusion distance in a probabilistic sense, avoiding the need for sharp cutoffs and maintaining some of the smaller-scale structure. Our method demonstrates further advantage in its robustness to outliers. We justify the approach with both theory and experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes