Nonparametric Nearest Neighbor Random Process Clustering
This addresses clustering challenges in signal processing or data analysis where process models are unknown, but it appears incremental as it builds on existing methods like spectral clustering and k-means with new analysis.
The paper tackles clustering of noisy observations from stationary ergodic random processes without prior knowledge of model statistics or cluster count, showing that two algorithms (NNPC and KM) succeed with high probability under noise and significant PSD overlap given sufficient observation length, with results quantifying tradeoffs between overlap, noise, and length.
We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their nonparametric generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the L1-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first algorithm, termed nearest neighbor process clustering (NNPC), to the best of our knowledge, is new and relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as k-means (KM), consists of a single k-means iteration with farthest point initialization and was considered before in the literature, albeit with a different measure of dissimilarity and with asymptotic performance results only. We show that both NNPC and KM succeed with high probability under noise and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the noise variance, and the observation length. Finally, we present numerical performance results for synthetic and real data.