ST LG MLNov 3, 2020

Convergence of Graph Laplacian with kNN Self-tuned Kernels

arXiv:2011.01479v210.829 citations

Originality Incremental advance

AI Analysis

This provides theoretical foundations for graph-based methods in unsupervised learning, addressing a gap for self-tuned kernels on manifold data, but it is incremental as it builds on existing kernel and Laplacian frameworks.

The paper tackles the problem of incomplete theoretical convergence results for graph Laplacians with kNN self-tuned kernels on manifold data, proving point-wise convergence and graph Dirichlet form convergence with rates, and revealing advantages like smaller variance error in low-density regions without needing prior knowledge of data density or dimension.

Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(α)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ ε\hatρ(x_i) \hatρ(x_j)})/\hatρ(x_i)^α\hatρ(x_j)^α$, where $\hatρ$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $α$. When $α= 1$, the limiting operator is the weighted manifold Laplacian $Δ_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hatρ$ which bounds the relative estimation error $|\hatρ - \barρ|/\barρ$ uniformly with high probability, where $\barρ = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.

View on arXiv PDF

Similar