Impact of signal-to-noise ratio and bandwidth on graph Laplacian spectrum from high-dimensional noisy point cloud
This work provides a theoretical understanding of how practitioners apply Graph Laplacian to noisy datasets, which is a common problem for data scientists working with real-world high-dimensional data.
This paper investigates the spectrum of kernel-based graph Laplacian (GL) constructed from high-dimensional, noisy random point clouds, where clean signals are embedded in a low-dimensional Euclidean subspace. It quantifies the interaction between signal and noise across various signal-to-noise ratio (SNR) regions, revealing peculiar spectral behaviors, and explores the impact of kernel bandwidth on the GL spectrum, leading to an adaptive bandwidth choice.
We systematically study the spectrum of kernel-based graph Laplacian (GL) constructed from high-dimensional and noisy random point cloud in the nonnull setup. The problem is motived by studying the model when the clean signal is sampled from a manifold that is embedded in a low-dimensional Euclidean subspace, and corrupted by high-dimensional noise. We quantify how the signal and noise interact over different regions of signal-to-noise ratio (SNR), and report the resulting peculiar spectral behavior of GL. In addition, we explore the impact of chosen kernel bandwidth on the spectrum of GL over different regions of SNR, which lead to an adaptive choice of kernel bandwidth that coincides with the common practice in real data. This result paves the way to a theoretical understanding of how practitioners apply GL when the dataset is noisy.