ML LG STOct 30, 2024

Improved convergence rate of kNN graph Laplacians

arXiv:2410.23212v17.52 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work provides theoretical guarantees for graph-based data analysis methods, which is incremental but important for researchers in machine learning and statistics dealing with manifold learning and spectral clustering.

The paper tackles the problem of proving convergence rates for k-nearest neighbor graph Laplacians to manifold operators under manifold data settings, achieving an improved rate of O(N^{-2/(d+6)}) with optimal parameter choices under certain regularity conditions, and a slower rate of O(N^{-1/(d+4)}) for lower regularities, validated by numerical experiments.

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = ε^{-d/2} \; k_0 ( \| x_i - x_j \|^2 / εφ( \widehatρ(x_i), \widehatρ(x_j) )^2 ) $, with $\widehatρ(x)$ being the (rescaled) $k$NN distance at the point $x$, $φ$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)}\,)$, up to a log factor, when $k_0$ and $φ$ have $C^3$ regularity and satisfy other technical conditions. This fast rate is obtained when $ε\sim N^{-2/(d+6)}\,$ and $k \sim N^{6/(d+6)}\,$, both at the optimal order to balance the theoretical bias and variance errors. When $k_0$ and $φ$ have lower regularities, including when $k_0$ is a compactly supported function as in the standard $k$NN graph, the convergence rate degenerates to $O(N^{-1/(d+4)}\,)$. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.

View on arXiv PDF

Similar